+ All Categories
Home > Documents > Exploring the Schistosoma mansoni adult male transcriptome using RNA-seq

Exploring the Schistosoma mansoni adult male transcriptome using RNA-seq

Date post: 05-Feb-2023
Category:
Upload: usp-br
View: 0 times
Download: 0 times
Share this document with a friend
10
1 2 Exploring the Schistosoma mansoni adult male transcriptome using RNA-seq 3 Giulliana Tessarin Almeida a , Murilo Sena Amaral a , Felipe Cesar Ferrarezi Beckedorff a , 4 João Paulo Kitajima b , Ricardo DeMarco c , Sergio Verjovski-Almeida a,5 a Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil 6 b Hospital Israelita Albert Einstein, São Paulo, Brazil 7 c Departamento de Física e Informática, Instituto de Física de São Carlos, Universidade de São Paulo, São Carlos, Brazil 8 9 11 article info 12 Article history: 13 Available online xxxx 14 Keywords: 15 Schistosoma mansoni 16 Trematode 17 RNA-seq 18 Genome annotation 19 Gene discovery 20 Micro-exon genes 21 22 abstract 23 Schistosoma mansoni is one of the agents of schistosomiasis, a chronic and debilitating disease. Here we 24 present a transcriptome-wide characterization of adult S. mansoni males by high-throughput RNA- 25 sequencing. We obtained 1620,432 high-quality ESTs from a directional strand-specific cDNA library, 26 resulting in a 26% higher coverage of genome bases than that of the public ESTs available at NCBI. With 27 a 15-deep coverage of transcribed genomic regions, our data were able to (i) confirm for the first time 28 990 predictions without previous evidence of transcription; (ii) correct gene predictions; (iii) discover 29 989 and 1196 RNA-seq contigs that map to intergenic and intronic genomic regions, respectively, where 30 no gene had been predicted before. These contigs could represent new protein-coding genes or non-cod- 31 ing RNAs (ncRNAs). Interestingly, we identified 11 novel Micro-exon genes (MEGs). These data reveal 32 new features of the S. mansoni transcriptional landscape and significantly advance our understanding 33 of the parasite transcriptome. 34 Ó 2011 Elsevier Inc. All rights reserved. 35 36 37 1. Introduction 38 Schistosomiasis is a chronic and debilitating disease that occu- 39 pies the second position in the rank, only behind malaria, in terms 40 of morbidity and mortality (Fenwick et al., 2009). The disease is 41 caused by an endo-parasite of the Schistosoma genus and Schisto- 42 soma mansoni is the prevalent species in the Americas, Africa and 43 Middle-East (WHO, 2002). Early studies estimated the S. mansoni 44 genome had a GC content of 34% (Hillyer, 1974), with 4–8% highly 45 repetitive sequences, 32–36% moderately repetitive sequences and 46 60% single copy sequences (Le Paslier et al., 2000; Simpson et al., 47 1982). 48 S. mansoni has a complex genome with 363 million bases ar- 49 ranged in eight pairs of chromosomes, seven autosomal pairs and 50 one pair of sex chromosomes. Recent sequencing efforts by the 51 Wellcome Trust Sanger Institute/TIGR with 8 coverage resulted 52 in an assembly of 5745 genomic scaffolds larger than 2 kilobases 53 (kb) (Berriman et al., 2009). This genomic information has an 54 unprecedented value, serving as a reference for several Schistosoma 55 studies. One of the most exciting possibilities generated by the 56 genome sequencing is the availability of a near complete descrip- 57 tion of the gene complement of the parasite. To this end, the 58 S. mansoni genome was annotated using EVidenceModeller and 59 PASA (Haas et al., 2008), resulting in 11,809 putative predicted 60 genes encoding 13,197 transcripts. Despite the evident value of 61 such resource, in silico predictions are not expected to be totally 62 correct (Haas et al., 2008), especially if we consider the limited 63 available information regarding both transcriptional activity and 64 S. mansoni splice signals. Given this complex scenario, it is difficult 65 to evaluate the present prediction level of accuracy. Only with 66 accumulation of further transcript sequences followed by their 67 mapping will permit the complete description of S. mansoni genes 68 and give further insights into functional genomics (Verjovski-Al- 69 meida et al., 2004). 70 S. mansoni ESTs sequencing has been performed by several 71 groups and permitted the accumulation of around 40 thousand se- 72 quences in the late 90’s and early 2000’s (Franco et al., 1995, 1997; 73 Merrick et al., 2003). The largest contribution to accumulation of 74 EST data was made in 2003 by a sequencing project that generated 75 163,586 ESTs from cDNA libraries from six developmental stages 76 (Verjovski-Almeida et al., 2003). The ESTs were assembled into 77 30,988 contigs (SmAEs – Schistosoma mansoni Assembled EST 78 sequences), which resulted in 92% sampling of an estimated 79 14,000 genes complement of S. mansoni (Verjovski-Almeida et al., 80 2003). Despite the relatively high transcript sampling achieved, 81 coverage of transcripts was still fragmentary, as evidenced by the 82 approximately 2.2 contigs per gene obtained in the ESTs assembly. 83 Moreover, the available EST data was used as one of the compo- 84 nents for the modeler to provide S. mansoni gene predictions in 0014-4894/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.exppara.2011.06.010 Corresponding author. Address: Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Av. Prof. Lineu Prestes 748, sala 1200, 05508- 000 São Paulo, SP, Brazil. Fax: +55 11 3091 2186. E-mail address: [email protected] (S. Verjovski-Almeida). Experimental Parasitology xxx (2011) xxx–xxx Contents lists available at ScienceDirect Experimental Parasitology journal homepage: www.elsevier.com/locate/yexpr YEXPR 6251 No. of Pages 10, Model 5G 8 July 2011 Please cite this article in press as: Almeida, G.T., et al. Exploring the Schistosoma mansoni adult male transcriptome using RNA-seq. Exp. Parasitol. (2011), doi:10.1016/j.exppara.2011.06.010
Transcript

1

2

3

4

567

89

1 1

1213

1415161718192021

2 2

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

Experimental Parasitology xxx (2011) xxx–xxx

YEXPR 6251 No. of Pages 10, Model 5G

8 July 2011

Contents lists available at ScienceDirect

Experimental Parasitology

journal homepage: www.elsevier .com/locate /yexpr

Exploring the Schistosoma mansoni adult male transcriptome using RNA-seq

Giulliana Tessarin Almeida a, Murilo Sena Amaral a, Felipe Cesar Ferrarezi Beckedorff a,João Paulo Kitajima b, Ricardo DeMarco c, Sergio Verjovski-Almeida a,⇑a Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazilb Hospital Israelita Albert Einstein, São Paulo, Brazilc Departamento de Física e Informática, Instituto de Física de São Carlos, Universidade de São Paulo, São Carlos, Brazil

a r t i c l e i n f o a b s t r a c t

23242526272829303132

Article history:Available online xxxx

Keywords:Schistosoma mansoniTrematodeRNA-seqGenome annotationGene discoveryMicro-exon genes

3334

0014-4894/$ - see front matter � 2011 Elsevier Inc. Adoi:10.1016/j.exppara.2011.06.010

⇑ Corresponding author. Address: DepartamentoQuímica, Universidade de São Paulo, Av. Prof. Lineu P000 São Paulo, SP, Brazil. Fax: +55 11 3091 2186.

E-mail address: [email protected] (S. Verjovski-Alme

Please cite this article in press as: Almeida, G.T.doi:10.1016/j.exppara.2011.06.010

Schistosoma mansoni is one of the agents of schistosomiasis, a chronic and debilitating disease. Here wepresent a transcriptome-wide characterization of adult S. mansoni males by high-throughput RNA-sequencing. We obtained 1620,432 high-quality ESTs from a directional strand-specific cDNA library,resulting in a 26% higher coverage of genome bases than that of the public ESTs available at NCBI. Witha 15�-deep coverage of transcribed genomic regions, our data were able to (i) confirm for the first time990 predictions without previous evidence of transcription; (ii) correct gene predictions; (iii) discover989 and 1196 RNA-seq contigs that map to intergenic and intronic genomic regions, respectively, whereno gene had been predicted before. These contigs could represent new protein-coding genes or non-cod-ing RNAs (ncRNAs). Interestingly, we identified 11 novel Micro-exon genes (MEGs). These data revealnew features of the S. mansoni transcriptional landscape and significantly advance our understandingof the parasite transcriptome.

� 2011 Elsevier Inc. All rights reserved.

35

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

1. Introduction

Schistosomiasis is a chronic and debilitating disease that occu-pies the second position in the rank, only behind malaria, in termsof morbidity and mortality (Fenwick et al., 2009). The disease iscaused by an endo-parasite of the Schistosoma genus and Schisto-soma mansoni is the prevalent species in the Americas, Africa andMiddle-East (WHO, 2002). Early studies estimated the S. mansonigenome had a GC content of 34% (Hillyer, 1974), with 4–8% highlyrepetitive sequences, 32–36% moderately repetitive sequences and60% single copy sequences (Le Paslier et al., 2000; Simpson et al.,1982).

S. mansoni has a complex genome with 363 million bases ar-ranged in eight pairs of chromosomes, seven autosomal pairs andone pair of sex chromosomes. Recent sequencing efforts by theWellcome Trust Sanger Institute/TIGR with 8� coverage resultedin an assembly of 5745 genomic scaffolds larger than 2 kilobases(kb) (Berriman et al., 2009). This genomic information has anunprecedented value, serving as a reference for several Schistosomastudies. One of the most exciting possibilities generated by thegenome sequencing is the availability of a near complete descrip-tion of the gene complement of the parasite. To this end, the

80

81

82

83

84

ll rights reserved.

de Bioquímica, Instituto derestes 748, sala 1200, 05508-

ida).

, et al. Exploring the Schistosom

S. mansoni genome was annotated using EVidenceModeller andPASA (Haas et al., 2008), resulting in 11,809 putative predictedgenes encoding 13,197 transcripts. Despite the evident value ofsuch resource, in silico predictions are not expected to be totallycorrect (Haas et al., 2008), especially if we consider the limitedavailable information regarding both transcriptional activity andS. mansoni splice signals. Given this complex scenario, it is difficultto evaluate the present prediction level of accuracy. Only withaccumulation of further transcript sequences followed by theirmapping will permit the complete description of S. mansoni genesand give further insights into functional genomics (Verjovski-Al-meida et al., 2004).

S. mansoni ESTs sequencing has been performed by severalgroups and permitted the accumulation of around 40 thousand se-quences in the late 90’s and early 2000’s (Franco et al., 1995, 1997;Merrick et al., 2003). The largest contribution to accumulation ofEST data was made in 2003 by a sequencing project that generated163,586 ESTs from cDNA libraries from six developmental stages(Verjovski-Almeida et al., 2003). The ESTs were assembled into30,988 contigs (SmAEs – Schistosoma mansoni Assembled ESTsequences), which resulted in 92% sampling of an estimated14,000 genes complement of S. mansoni (Verjovski-Almeida et al.,2003). Despite the relatively high transcript sampling achieved,coverage of transcripts was still fragmentary, as evidenced by theapproximately 2.2 contigs per gene obtained in the ESTs assembly.Moreover, the available EST data was used as one of the compo-nents for the modeler to provide S. mansoni gene predictions in

a mansoni adult male transcriptome using RNA-seq. Exp. Parasitol. (2011),

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

2 G.T. Almeida et al. / Experimental Parasitology xxx (2011) xxx–xxx

YEXPR 6251 No. of Pages 10, Model 5G

8 July 2011

the genome project (Berriman et al., 2009), however the exactweight given to this attribute in relation to other components suchas species conservation was not described.

Recent advances in next-generation sequencing technologypromise to accelerate the acquisition of sequences and diminishthe cost of sequencing of large and complex genomes as well asof transcriptomes (Pennisi, 2011; Shaffer, 2007). In the presentwork, we used Roche 454 pyrosequencing to explore the S. mansoniadult male transcriptome (RNA-seq). We used the most recent ver-sion of the genome sequence assembly (ftp://sanger.ac.uk) formapping and annotation of Roche 454 sequences. This analysispermitted us to evaluate the accuracy of the present gene predic-tions and the potential of RNA-seq to provide a more completedescription of the S. mansoni genes.

155

156

157

158

159

160

161

162

163

164

165

166

167

168

2. Materials and methods

2.1. Schistosoma mansoni genomic information

In this work, we used information available (December 2010) inthe public data repository at ftp://sanger.ac.uk/pub/databases/Trematode/S.mansoni/genome/GFF/Smansoni.tar.gz. This file con-tains the most recent version of the S. mansoni genome with19,022 scaffolds (Berriman et al., 2009), as well as the current up-dated annotations (compatible with Sanger Institute GeneDB web-site – http://www.genedb.org/Homepage). These data wereprovided by the Parasite genomics (Helminths) group at the Well-come Trust Sanger Institute and can be obtained at the abovefolder in ftp://sanger.ac.uk.

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

2.2. PolyA+ RNA preparation

Approximately 200 adult S. mansoni males freshly obtainedfrom hamsters infected with mixed-sex parasites were collectedby periportal perfusion (Pellegrino and Brener, 1956); males wereseparated from females by keeping the parasites in 150 mm cul-ture dishes (Corning, #430599) using Advanced RPMI Medium1640 (Gibco, #12633-012) supplemented with 10% heat-inacti-vated calf serum (freshly added), 12 mm Hepes (4-(2-hydroxy-ethyl) piperazine-1-ethanesulfonic acid) pH 7.4, and 1%antibiotic/antimycotic cell culture mixture (Gibco, #15240-096)at room temperature for a period of 15–30 min. Infected hamsterswere maintained at Instituto Adolfo Lutz that approved the study,and the experimental procedures were conducted adhering to theinstitution’s guidelines for animal husbandry. PolyA+ RNA was ex-tracted from adult males using two rounds of FastTrack MAG MaximRNA Isolation Kit (Invitrogen), essentially as directed by themanufacturer, with the following modifications: treatment with10 U of DNase I, Amplification Grade (Invitrogen), for 10 min atroom temperature and six washings at the final washing step be-fore elution. RNA was quantified using the Quant-iT RiboGreenRNA Reagent (Invitrogen) and assessed for integrity by electropho-resis with Bioanalyzer RNA Pico LabChip (Agilent Technologies).PolyA+ RNA samples (500 ng) were heat-fragmented to approxi-mately 1000 nt by incubating for 1 min 30 s at 82 �C in Fragmenta-tion Buffer (40 mM Tris–Acetate, 100 mM Potassium Acetate,31.5 mM Magnesium Acetate, pH 8.1) and immediately thereaftertransferred to ice to stop the fragmentation reaction. FragmentedRNA was further purified with RNAClean (Agencourt) as per themanufacturer’s directions, except for the amount of beads thatwas reduced to 1.6� the volume of the sample. After purification,5 ng of fragmented polyA+ RNA were evaluated on an RNA 6000Pico Lab Chip on the 2100 Bioanalyzer (Agilent Technologies) andcompared to a non-fragmented sample to confirm fragmentation.

Please cite this article in press as: Almeida, G.T., et al. Exploring the Schistosomdoi:10.1016/j.exppara.2011.06.010

2.3. cDNA library preparation

In brief, first strand cDNA libraries were prepared from 500 ngof polyA+ heat-fragmented RNA using Superscript III (Invitrogen)and random primers. Modified directional adaptors containing454 Titanium adaptor A and B sequences were ligated to the sin-gle-stranded cDNAs. The resulting single-stranded adapted cDNAlibraries were PCR amplified and further purified, resulting indscDNA libraries. Detailed procedures are described in Supplemen-tary methods.

2.4. Pyrosequencing and data processing

Directional dscDNA libraries were sequenced from the 30-end ofRNA using the emPCR Titanium Kit and Titanium Sequencing Kit ona Roche 454 Genome Sequencer FLX instrument following themanufacturer’s instructions (Margulies et al., 2005). Data process-ing used standard 454 software procedures to generate nucleotidesequences and quality scores for all reads. High-quality (minimumphred score 20) trimmed reads were generated. Sequence datawere FASTA formatted.

2.5. Data analysis

In order to map sequences to the genome, Blat v34 (Kent, 2002)was used to align all NCBI public S. mansoni ESTs and our 454 high-quality S. mansoni ESTs (RNA-seq ESTs) against the genome project(Berriman et al., 2009) scaffolds (only those larger than 2000 bp).Blat identifies the putative splicing pattern inside the ESTs andthe generated mapping reports are presented using formats verysuitable to be parsed by scripts. For both datasets, only alignmentswith more than 80% identity and minimum score of 40 were ac-cepted as a first screening. The low score threshold used in this ini-tial step takes into account that both datasets have short sequenceswith less than 50 bp.

On account of the presence of gene families and protein domainconservation, an EST could have multiple hits when aligned withthe Blat tool to the S. mansoni genome. pslCDnaFilter tool (UCSC –available at http://hgdownload.cse.ucsc.edu/admin/exe/) was usedto choose the best alignment among those with a minimum of 90%Blat identity and a minimum Blat coverage of 75%. If there was a tieamong the best hits, an adjusted gap score, adapted from that sta-ted in Blat website, was applied (number of matches � number ofmismatches � log2 (number of query inserts + number of target in-serts + 1)). If still there was a tie, one was chosen randomly. Theabove formula weights for one-exon ESTs and avoids acceptingESTs whose mappings are scattered onto the genome. The selectedalignments were further adjusted in a way that putative introns of<30 nt were coalesced to the flanking exons. In the entire analysis,NCBI public S. mansoni ESTs were considered as sequences of un-known transcription orientation and, therefore, mappings on dif-ferent genome strands were not discriminated. On the otherhand, the RNA-seq ESTs were generated with a directional methodand therefore alignments on the genomic plus strand were sepa-rated from alignments on minus strand.

A second set of alignments was created decreasing the coverageparameter of pslCDnaFilter to 15%. The resulting alignments withBlat coverage ranging from 15% to 75% were separated and assem-bled using Newbler 2.5 p1 (Kumar and Blaxter, 2010), an approachthat could reveal genome assembly problems, genes in scaffoldborders or simply EST quality problems and artifacts, besides even-tual Blat limitation to find the whole alignment.

All the ESTs that aligned with the genome at coverage <15%regardless the Blat identity were finally assembled with Newbler2.5 p1. This second assembly may reveal new Schistosoma genesabsent in the current version of the genome as well as contamina-

a mansoni adult male transcriptome using RNA-seq. Exp. Parasitol. (2011),

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

G.T. Almeida et al. / Experimental Parasitology xxx (2011) xxx–xxx 3

YEXPR 6251 No. of Pages 10, Model 5G

8 July 2011

tions and other sequencing artifacts. The above assemblies withNewbler were performed with standard default parameters andthe -cdna option, necessary for EST assembly. Annotation of out-of-the-predictions and out-of-the-genome EST assemblies was per-formed using standard NCBI Blast (Altschul et al., 1990) againstNCBI NT (nucleotides – BlastN with low complexity region filteringflag set and e-value threshold of 1e�10) and NR (proteins – BlastXwith low complexity region filtering flag set and e-value thresholdof 1e�5) databases. Additionally, an alignment of subsets of ESTsagainst the genome transcript predictions was performed with Blat(with option – fastMap set. i.e., not allowing introns). This step wasnecessary to complement genomic coverage analysis and evalua-tion of the correct directionality of cDNA library preparation andsequencing. The pipeline also comprised several homemade Perlscripts.

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

3. Results

3.1. RNA-seq acquired sequences

RNA-seq by pyrosequencing was performed with adult maleRNA samples. A total of 1620,432 high-quality ESTs with averagelength = 232 nt (40–1433 nt) was obtained, totaling376,022,395 nt. Quantitative data related to the sequencing runsare presented in Table 1. For comparison, the S. mansoni ESTs data-set publicly available at NCBI was composed of 205,892 ESTs withaverage length = 377 nt (15–1421 nt) and a total size of77,666,468 nt. The RNA-seq dataset is 5-fold larger than the publicNCBI data. RNA-seq data was deposited at NCBI SRA (Short ReadArchive) division with accession number SRA030439 and furtheranalyzed with Blat alignment to the genome, as described below.

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

3.2. Genomic coverage by RNA-seq from S. mansoni adult worms

In order to find the single best alignment for each RNA-seq ESTs(and of NCBI reads) to the genome (within scaffolds larger than2 kb) we used Blat search with stringent parameters (identityP90% and coverage P75%) resulting in the alignment to the gen-ome of 1422,955 RNA-seq ESTs (88% of the total) and of 177,639NCBI ESTs (86% of the total). In spite of the stringent parameters,multiple top alignments were produced: an average of 4.8 align-ments per RNA-seq EST and an average of 2.6 alignments per NCBIEST. It is noteworthy that both datasets resulted in the same frac-tion of aligned sequences (86–88%), signaling that 12–14% S. man-soni transcribed reads were not represented in the current versionof the genome.

A Perl script implementing an alignment score, based on thelogarithm of query and target gap counts, was used to elect thebest genomic alignment for each EST (see Section 2). This step re-sulted in each of the 1422,955 RNA-seq ESTs (and 177,639 NCBIESTs) uniquely aligned onto the S. mansoni genome within scaf-

301

302

303

304

305

306

307

308

309

310

311

312

Table 1General data related to the Roche 454 sequencing runs.

Roche 454Platenumber

Numberof reads

% Of reads with >5%wrong bases perread

Totalnumber ofnt bases

Averageread length(nt)

F3OJY3D01 130,876 0.3 12,163,353 93F3OJY3D02 149,016 0.1 15,529,411 104F7KOKND03 140,765 0.3 30,735,780 218GBS495Z01 394,809 0.2 109,449,082 277GBS495Z02 465,600 0.2 127,277,505 273GDEH89M01 158,741 0.2 37,506,059 236GDEH89M02 180,625 0.2 43,361,205 240Total 1620,432 376,022,395 232

Please cite this article in press as: Almeida, G.T., et al. Exploring the Schistosomdoi:10.1016/j.exppara.2011.06.010

folds larger than 2 kb. These uniquely aligned RNA-seq ESTs werecomprised of a total of 336,674,228 nt and covered 21,801,207genome bases, resulting in a 15�-deep coverage. The RNA-seq se-quences produced a 26% higher coverage of genome bases than thepublic ESTs available at NCBI (which covered 17,298,362 genomebases).

Finally, there were 76,976 RNA-seq (and 10,731 NCBI) EST se-quences with partial alignment to the genome (with Blat identityP90% and 15% 6 coverage 675%) as well as 120,501 RNA-seq ESTs(and 17,522 NCBI ESTs) with no alignment.

3.3. Transcript profiling by strand-specific directional 30-endsequencing

When mapping expressed transcript sequences to a genome it isof great value to have the direction of transcriptional activity, i.e.,to be able to uniquely map the transcript to either the plus orminus genomic strand. This information is important to resolveoverlapping protein-coding messages on the two strands. To allowstrand-specific mapping, it is imperative that directionality infor-mation contained in single-stranded RNA is preserved in theresulting cDNA library reads. There was no strand-specific direc-tional method for Roche 454 cDNA library construction. Therefore,here we have developed and optimized a directional strand-spe-cific cDNA library construction method suitable for the 454 GS-FLX Titanium sequencing protocol, which was adapted from themethod described by Maher et al. (2009) for construction of direc-tional libraries. Essentially, single-stranded cDNA was generatedfrom the RNA template with reverse transcriptase, and directionaldouble-stranded adaptors with modified-protected bases at theirends were ligated to the sscDNA, as detailed in Supplementarymethods. Adaptor A, which is used for priming the Roche 454sequencing reaction, was ligated at the 50-end of sscDNA (30-endof RNA) whereas adaptor B was ligated at the 30-end of sscDNA.

It is known that reverse transcriptase has a tendency to gener-ate spurious second-strand cDNA based on its DNA-dependentDNA polymerase activity (Spiegelman et al., 1970). Priming offirst-strand cDNA could occur by either a hairpin loop at its 30

end or by re-priming, from either RNA fragments or from the pri-mer used for the first-strand synthesis (Gubler, 1987). In order toestimate the fraction of 454 RNA-seq EST sequences that couldarise from reverse transcriptase spurious second-strand cDNA,we looked for RNA-seq ESTs that mapped to the genome within apredicted protein-coding gene with splicing, i.e., covering two ormore exons, thus pointing to possible inverted intron acceptor/do-nor splicing sites. A total of 98,816 spliced RNA-seq ESTs mappedto gene predictions, of which 93,776 (95%) were in the same orien-tation of gene predictions and 5040 spliced RNA-seq ESTs (5%)mapped in the reverse orientation, thus suggesting that the direc-tional strand-specific method described here is subject to an over-all 5% rate of artifacts probably arising from spurious second-strand reverse transcription.

3.4. Saturation of the adult male transcriptome by RNA-sequencing

Fig. 1 presents the RNA-seq and NCBI ESTs coverage of S. man-soni genome bases at different fractions of each EST database. Foreach dataset, mapped sequences were randomly divided into 10different sub-sets and genome coverage was computed as a func-tion of the increasing number of mapped ESTs sub-sets. The curvein Fig. 1 suggests that generation of additional RNA-seq ESTs fromadult male parasites probably would not improve considerably thegenome base coverage. It is noteworthy that NCBI ESTs were de-rived from six different S. mansoni life-cycle stages (Verjovski-Al-meida et al., 2003) and genome base coverage seemed to havealmost reached saturation with the over 200,000 ESTs available;

a mansoni adult male transcriptome using RNA-seq. Exp. Parasitol. (2011),

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

420

421

422

423

424

Fig. 1. Genome base coverage by RNA-seq and NCBI sequences at different fractionsof each dataset. Each dataset was randomly divided into 10 different sub-sets andgenome coverage was computed as a function of the increasing number of mappedESTs sub-sets.

4 G.T. Almeida et al. / Experimental Parasitology xxx (2011) xxx–xxx

YEXPR 6251 No. of Pages 10, Model 5G

8 July 2011

nevertheless, RNA-seq obtained here, with just adult male para-sites, has resulted in a 26% higher coverage. The most relevant dif-ference between the two datasets is that for the NCBI publiclyavailable ESTs the cDNA libraries were normalized (Franco et al.,1995; Verjovski-Almeida et al., 2003) and ESTs were preferentiallysequenced from the center of the messages (Neto et al., 2000; Ver-jovski-Almeida et al., 2003) whereas for RNA-seq we have not usedany cDNA library normalization method and the sequences weredistributed more evenly along the genes; the data suggests thatdeep-sequencing RNA-seq approach allows for a more comprehen-sive description of expressed genes. It is interesting to note thatpublicly available Schistosoma japonicum ESTs were obtained fromnon-normalized cDNA libraries (Hu et al., 2003; Liu et al., 2006)which have detected in S. japonicum a number of gene homologsthat were first sequenced in S. mansoni in the present work (seeSections 3.5 and 3.7 below), suggesting that the lack of normaliza-tion may play an important role in preserving gene diversity repre-sentation of cDNA libraries.

We have further checked if ESTs from genes known to be relatedto male adult worms appeared in the RNA-seq dataset. For thisqualitative analysis, two literature references were used to checkif gender (Fitzpatrick et al., 2005) and life stage (Jolly et al., 2007)characteristic genes appeared. This analysis was not extensiveand the goal was to check if an important gene could be missing.Both works made available oligonucleotide sequences or consen-sus sequences representing the genes of interest. We alignedRNA-seq ESTs against both datasets and counted reference se-quences from these two papers that were not sampled by RNA-seq.

Among the 108 oligonucleotide-sequences listed as male se-quences in the Supplementary material of Fitzpatrick et al.(2005), 11 did not have correspondent RNA-seq ESTs. Three ofthe 11 oligonucleotides did not align with the genome or with genepredictions. The remaining eight oligonucleotides aligned uniquelywith gene predictions, and these predictions aligned with RNA-seqESTs (Blat identity P95% and coverage P95%), showing that RNA-seq reads and the oligonucleotide reference sequence covered dif-ferent regions of the same male genes. Therefore all male associ-ated genes of this work (Fitzpatrick et al., 2005) have ESTs in theRNA-seq dataset.

For the life stage genes (Jolly et al., 2007), a similar analysis wasperformed. A total of 240 oligonucleotide probes designed overTIGR tentative consensus (TC) release 5 (http://compbio.dfci.har-vard.edu/cgi-bin/tgi/gimain.pl?gudb=s_mansoni) were associatedto genes expressed in adult worms (Jolly et al., 2007). A first Blatalignment (P95% Blat identity and minimum Blat score 38) re-sulted in 160 probes having correspondent RNA-seq ESTs. Theremaining dataset of 80 probes that did not have RNA-seq ESTs in-cluded 22 probes that are absent from the genome scaffolds. Gen-

Please cite this article in press as: Almeida, G.T., et al. Exploring the Schistosomdoi:10.1016/j.exppara.2011.06.010

ome full-length gene predictions related to the remaining 58probes were retrieved and checked if RNA-seq ESTs aligned withthem. Of these 58 probes, (a) 26 probes were related to gene pre-dictions with RNA-seq ESTs aligned to them, again showing thatprobe and RNA-seq had covered different regions of the samefull-length gene prediction; (b) 8 probes were part of gene predic-tions not sampled by RNA-seq ESTs; and (c) 24 probes did not alignwith gene predictions. The complete TC sequences of these 24 lastprobes were retrieved and checked against the RNA-seq database;8 TCs did not align with RNA-seq. Considering that the work ofJolly et al. (2007) had analyzed adult male and female worms, itis likely that the 16 genes absent from our RNA-seq dataset shouldbe specific to female worms. It is interesting to note that none ofthese 16 genes were considered female specific in the above Fitz-patrick et al. study ( 2005). The list of 8 Smp predictions and 8TCs detected with enriched expression in adults (Jolly et al.,2007) and not sampled by RNA-seq in males (present study) is pre-sented in Supplementary document 1.

3.5. RNA-seq and genome annotation: predictions were newlyconfirmed by ESTs

The current genome annotation includes 13,207 predictedgenes (Berriman et al., 2009). We have looked for gene predictionswithout EST confirmation by first aligning NCBI public ESTs againstthem. A direct, less stringent alignment of the ESTs against tran-script predictions (Blat identity P75% and BlastN e-value thresh-old 10�10) revealed that 10,371 predictions were confirmed by160,026 NCBI ESTs, which covered 11,627,752 nt. Lower stringencywas adopted in this approach in order to tackle the repetitive nat-ure of the genome and conservatively estimate the newly con-firmed predictions by RNA-seq ESTs. Prediction coverage in thissection was computed from all the multiple alignments of ESTson the predictions. Also, in this section we considered predictionsof all scaffolds. It is apparent that there were 2836 gene predictionsfrom the genome project (Berriman et al., 2009) without any pre-vious confirmation from ESTs in the public domain dataset.

Next, we turned to our RNA-seq ESTs to find which gene predic-tions had evidence of expression in adult male parasites. For thispurpose, we used the stringent alignment parameters describedin Section 3.2 above, which resulted in each of 931,547 RNA-seqESTs uniquely aligned onto the S. mansoni genome within a totalof 9928 predictions. The complete list of these Smp predictedgenes that have been sampled by RNA-seq ESTs and the RNA-seqcounts for each gene is shown in Supplementary Table 1. TheseRNA-seq ESTs have covered 8296,866 nt.

A total of 8938 predictions were covered by ESTs from both thepublic and the present RNA-seq datasets (within these predictions6475,063 nt were common to both datasets, 4496,438 nt wereexclusive from NCBI ESTs and 1487,342 nt exclusive from RNA-seq ESTs). It is apparent that NCBI and RNA-seq ESTs overlap tosome extent; nevertheless, they cover different segments of thesame predicted genes.

Interestingly, 990 predictions (334,461 nt) had their expressionin adult male worms confirmed for the first time by RNA-seq ESTs(Supplementary Table 2), out of 2836 predictions without any pre-vious EST confirmation. These 990 newly confirmed predictionscomprise 6014 matched RNA-seq ESTs representing an average of6.1 RNA-seq ESTs per prediction. Distribution of the number ofRNA-seq ESTs per prediction is presented in Fig. 2. It is apparentthat 67% of these genes newly confirmed as expressed in adultmale parasites have 2 or more ESTs per prediction, and some 16%of them seem to be more highly expressed, with 11 to more than60 ESTs per gene.

The top 20 most highly expressed Smp predicted genes with nopreviously detected ESTs are presented in Table 2. It is noteworthy

a mansoni adult male transcriptome using RNA-seq. Exp. Parasitol. (2011),

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

Fig. 2. Distribution of newly confirmed gene predictions as a function of thenumber of RNA-seq ESTs per gene. Distribution shows the number of predictedgenes that had no previous confirmation of expression by public NCBI ESTs andhave now been confirmed by RNA-seq ESTs. Each column shows the number ofpredicted genes having the amount of RNA-seq ESTs per gene that is indicated inthe x-axis.

Table 2List of top 20 most highly expressed predicted genes with no previous ESTs in thepublic NCBI dataset.

# Prediction # RNA-seqESTs

Annotation

1 Smp_195100 115 U6 snRNA-associated sm-like proteinLsm2

2 Smp_178560 87 Hypothetical protein3 Smp_167280 68 Hypothetical protein4 Smp_052500 59 Hypothetical protein5 Smp_188580 58 Hypothetical protein6 Smp_124520 58 Hypothetical protein7 Smp_187260 55 Hypothetical protein8 Smp_158890 52 Hypothetical protein9 Smp_152970 52 Hypothetical protein10 Smp_147770 49 Hypothetical protein11 Smp_118220 48 Ubiquitin-conjugating enzyme rad6

putative12 Smp_036660 47 Hypothetical protein13 Smp_124290 44 Hypothetical protein14 Smp_054470 44 Thioredoxin Trx2 with Pfam:PF0008515 Smp_142520 42 Zinc finger protein putative16 Smp_123080 42 Sarcoplasmic calcium-binding protein -

SCP-putative17 Smp_174660 39 Hypothetical protein18 Smp_170280 39 Integrin alpha-ps putative19 Smp_032790 37 Hypothetical protein20 Smp_026340 37 Hypothetical protein

Fig. 3. Smp_156170 new 30-end exon confirmed by RNA-seq high-quality

G.T. Almeida et al. / Experimental Parasitology xxx (2011) xxx–xxx 5

YEXPR 6251 No. of Pages 10, Model 5G

8 July 2011

that 14 (70%) of these top most highly expressed predicted geneswithout previous EST evidence are annotated as ‘‘hypothetical’’,i.e., there are no homologous genes from other species in GenBank.To check if these 990 predictions newly confirmed as expressed inS. mansoni adult males had evidence of expression in S. japonicum,we cross-referenced them to 103,725 NCBI public S. japonicumESTs (as of December 2010) using Blat (Blat identity P75% andEST coverage P0.9 and validated with Blast); 346 S. mansoni pre-dictions with newly confirmed expression in male had previousevidence of expression in S. japonicum public ESTs database (Huet al., 2003; Liu et al., 2006).

sequences. The figure shows the relative positions of several elements aligned tothe sequence of a portion of Smp_scaff000191 represented by a thin line at the topof the figure; labeled marks represent the base-pair coordinates of the genomicsequence in the scaffold. (A) Shows the original Smp_156170 prediction at its 30-end (exons 11–13; solid black boxes). (B) Shows Smp_156170 corrected exon(white box); in the new gene structure, exon 11 was extended by 1199 nt at the 30-end and a new STOP codon was identified 3584 nt downstream of the beginning ofthe extended exon, adding 54 amino acids to the C-terminal end of the protein.Exons 12 and 13 were eliminated from the prediction. (C) Shows RNA-seq ESTs(dark gray) and publicly available NCBI ESTs (light gray) that were used forcorrecting the original prediction.

3.6. RNA-seq provided adjustments to previous genome annotations

It is important to evaluate the correctness of a full-length geneprediction, as investigators frequently use it to design PCR primersfor cloning the cDNA messages of interest. Having this in consider-ation, the most critical information is the correct prediction of theextremities of messages. In order to identify adjustments to the

Please cite this article in press as: Almeida, G.T., et al. Exploring the Schistosomdoi:10.1016/j.exppara.2011.06.010

current annotation, especially at the 30-end of messages, which isfavored by the RNA-seq method used here, all genome mappedESTs were clustered according to their genomic coordinates andthe resulting exons were compared to the current exon annotation.Clustered ESTs-to-genome mapping identified 114,502 exons inthe RNA-seq ESTs (the 1422,955 RNA-seq ESTs are composed of1053,438 non-spliced one-exon ESTs, and 369,517 spliced n-exonESTs). In parallel, we mapped NCBI ESTs, which resulted in62,415 exons (the 177,639 NCBI ESTs are composed of 97,464one-exon ESTs, i.e., ESTs mapped with no splicing, and 80,175spliced n-exon ESTs).

Concerning exons positioning with respect to predicted genes,76,738 exons of RNA-seq ESTs were mapped to predicted gene loci,19,552 to within 5 kb in the flanking regions around gene loci (rep-resenting putative UTR sequences), and 18,212 exons weremapped to intergenic regions of the genome, devoid of previousgene annotations. For the NCBI sequences, 50,988 exons weremapped to predicted gene loci, 5711 were mapped to within 5 kbin the flanking regions around gene loci (representing putativeUTR sequences) and 5716 exons mapped to intergenic regions.

A detailed analysis of exon mapping revealed that 9520 exonsconfirmed by RNA-seq are different from those contained in genepredictions; among them a number of 30-end exons from predic-tions were not confirmed by RNA-seq and would not provide use-ful information for primer design and PCR amplification of the full-length coding-region.

Fig. 3 shows an example of a new 30-end exon confirmed byRNA-seq high-quality sequences. Smp_156170 predicted gene isannotated as hypothetical protein, and it has an ARID/BRIGHT(pfam 01388) AT-rich interactive domain (Kegg K11653 orthologyannotation) with a DNA-binding motif. The predicted gene has 13exons (mapped to Smp_scaffold_000191) and encodes a 2565 ami-no acid protein. Fig. 3 (part A) shows the 30-end (exons 10–13) ofthe original Smp_156170 prediction. Fig. 3 part B showsSmp_156170 corrected exons; in the new gene structure thethird-to-last exon (exon 11) was coalesced with the second-to-lastexon (exon 12). The 4 ESTs that were used to correct this gene gap,wrongly assigned as an intron, were at NCBI from a previoussequencing project (Verjovski-Almeida et al., 2003). A total of192 RNA-seq ESTs were used to correct the 30-end of this predic-tion (some of these RNA-seq ESTs are shown in Fig. 3 part C). The

a mansoni adult male transcriptome using RNA-seq. Exp. Parasitol. (2011),

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

506

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

6 G.T. Almeida et al. / Experimental Parasitology xxx (2011) xxx–xxx

YEXPR 6251 No. of Pages 10, Model 5G

8 July 2011

corrected sequence has a total length of 3584 nt and a new STOPcodon was identified in relation to the original prediction, adding54 amino acids to the C-terminal end of the protein. As a conse-quence, exons 11 and 12 were coalesced and exon 13 from the pre-diction was eliminated.

We searched for RNA-seq ESTs mapped to within the 5 kb flank-ing region of predicted genes as a strategy to identify possible newpredicted 30-end candidates; a total of 92,509 candidate 30-endRNA-seq ESTs were found: they are a sub-set of the 211,863 se-quences that are mapped to the same genome strand as the pre-dicted gene. One example of a RNA-seq EST that has identified anew exon within 1830 nt downstream of the last predicted exonis presented in Fig. 4. This RNA-seq EST has been predicted bythe Augustus gene predictor to be part of exons 13 and 14 of theORF coding region of Smp_145320 corrected sequence; the newprediction eliminates previously predicted exon 13 (Fig. 4A andB). It should be noted that Smp_145320 has 2115 nt, and therewas one public NCBI EST only covering 77 nt. It has been annotatedas a putative sulfate transporter. The new 30-end exons have beenconfirmed by 9 RNA-seq ESTs (Fig. 4C).

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

3.7. RNA-seq has confirmed new transcribed loci in the genome

In order to detect new putative transcribed loci in intergenic re-gions of the genome, devoid of previous gene predictions, we per-formed a per-scaffold assembly of RNA-seq ESTs with Newbler 2.5p1 (Kumar and Blaxter, 2010). Assembled contigs are larger thansingle ESTs and are expected to better represent transcripts. We re-trieved all RNA-seq ESTs mapped (with the Blat aligner) uniquelyto a given scaffold (among the 1422,955 aligned RNA-seqs) andassembled them, thus generating contigs and singlets per scaffold.For the present analysis of new intergenic transcribed loci, we usedonly the assembled contigs (not the singlets). A total of 12,025RNA-seq EST contigs were obtained, which were comprised of981,289 ESTs. They were mapped back to the genome scaffoldsusing Blat; of these, 989 RNA-seq contigs mapped to intergenicgenomic regions, where no gene had been predicted, thus pointingto possible new S. mansoni genes. The remaining 11,036 contigsmapped to genomic loci with predicted genes.

These 989 contigs that mapped to intergenic genomic regionswere assembled from 64,849 RNA-seq ESTs, representing an aver-age of 65.5 ESTs/contig. In order to provide annotation, these con-

567

568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

Fig. 4. Smp_145320 transcript prediction corrected by RNA-seq at its 30-end. Thefigure shows the relative positions of several elements aligned to the sequence of aportion of Smp_scaff000103 represented by a thin line at the top of the figure;labeled marks represent the base-pair coordinates of the genomic sequence in thescaffold. (A) Original Smp_145320 prediction at its 30-end (exons 11–13, solid blackboxes). (B) Smp_145320 prediction corrected by RNA-seq ESTs. The correctedprediction of Smp_145320 has two exons (13 and 14; white boxes) starting at1830 nt downstream of the previously predicted last exon. The corrected predictionadded 118 new amino acids to the C-terminal end of the protein; (C) RNA-seq ESTs(gray boxes) that were used for extending the original prediction.

Please cite this article in press as: Almeida, G.T., et al. Exploring the Schistosomdoi:10.1016/j.exppara.2011.06.010

tigs were aligned with Blast-X to GeneBank NR database (with low-complexity filtering and e-value threshold of 1e�5).

A total of 839 RNA-seq contigs from intergenic regions had nohits found in the public databases, indicating that these genes ex-pressed in adult S. mansoni have no homolog in other species; it isonly logical that they have not been predicted in silico by the genemodeler tools of the genome sequencing project.

Another set of 20 RNA-seq contigs, which had mapped to inter-genic regions of the S. mansoni genome, were found to be similar(BlastX e-value threshold of 1e�5) to S. mansoni predicted genesthat had been annotated at other genomic loci; they probably rep-resent gene paralogs. In addition, 107 RNA-seq contigs from inter-genic regions were similar to genes already detected in otherSchistosoma species, mainly S. japonicum (Hu et al., 2003; Liuet al., 2006). Finally, 23 RNA-seq contigs from intergenic regionswere similar to genes from other organisms.

Table 3 presents annotation for the top 20 RNA-seq contigs fromintergenic regions with the highest EST counts and with similarityto other GenBank genes (i.e., annotation different from ‘‘No hitsfound’’). These most highly expressed genes represented by RNA-seq contigs from intergenic regions are all of them similar to genesfrom S. japonicum and had not been predicted by the S. mansonigenome project. Distribution of the number of ESTs per contig ispresented in Fig. 5. It can be seen that over 50% of these 989 newtranscribed gene contigs (Supplementary Table 3) seem to behighly expressed in S. mansoni adult males, with 20 to more than60 ESTs per contig (Fig. 5).

Looking for possible new genes transcribed from genomic re-gions that represent introns of previously predicted genes, we per-formed a per-scaffold assembly and annotation of RNA-seq ESTsthat mapped to intronic genomic regions. A total of 1196 contigsmapped to intronic regions were obtained, which were assembledfrom 99,681 RNA-seq ESTs. Supplementary Table 4 shows the com-plete list with annotations. A BlastX search against the GenBankproteins database revealed that 133 RNA-seq intronic contigs hadsimilarity to S. mansoni genes previously annotated at other geno-mic loci, however 91 of them (68%) are transcripts from S. mansonitransposons and only 42 contigs (32%) possibly represent new S.mansoni gene paralogs of previously predicted genes (Supplemen-tary Table 4). An additional 188 RNA-seq intronic contigs had sim-ilarity to genes of other Schistosoma (mainly S. japonicum) and ofother organisms, and again 71 of them (38%) encode proteins sim-ilar to transposons of these other organisms. Finally 875 RNA-seqintronic contigs had no similarities in GenBank and could representnew protein-coding or ncRNAs uniquely expressed in S. mansoni.

Contigs generated by the Newbler assembly performed for thisanalysis, together with those resulting from two additional New-bler runs (one with only RNA-seq ESTs that had partial hits tothe genome, and another with RNA-seq ESTs that had no hits tothe current genome scaffolds), were deposited at the NCBI TSAdivision (Transcriptome Shotgun Assembly Sequence Database)with accession numbers JI387100–JI400294 and can be searchedwith standard Blast tools available at NCBI. This deposit consistsof 13,195 contigs P200 nt generated from 1029,503 RNA-seq ESTs.Deposited sequences have the following nomenclature: SmBR-C999999, where C stands for Contigs (singlets cannot be depositedin TSA).

3.8. RNA-seq has identified expressed genes that were not representedin the S. mansoni genome assembly

The best approach to find candidate genes absent from the S.mansoni genome scaffolds is to identify EST transcripts that donot map to the genome, and at the same time have sequence sim-ilarity to other Schistosoma (using BlastN/BlastX searches). EST se-quences that are found by BlastN to be highly similar to nucleotide

a mansoni adult male transcriptome using RNA-seq. Exp. Parasitol. (2011),

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618

619

620

621

622

623

624

625

626

627

628

629

Table 3List of top 20 most highly expressed contigs that mapped to intergenic genomic regions, where no genes had been previously predicted. Only contigs with sequence similarity togenes from other species were listed here.

Contig Size (bp) # RNA-seq ESTs Annotation

Contig00245-Smp_scaff004342 205 5363 gi|12249164|ref|NP_066214.2| NADH dehydrogenase subunit 4[Schistosoma mansoni]

Isotig00005-Smp_scaff000007 662 445 gi|5305389|gb|AAD41627.1|AF072328_1 dynein light chain 2[Schistosoma japonicum] dynein light chain 4 [Schistosoma japonicum]

Isotig00004-Smp_scaff000076 388 303 gi|226469750|emb|CAX76705.1| hypothetical protein [Schistosoma japonicum]Isotig00003-Smp_scaff000076 386 302 gi|226469750|emb|CAX76705.1| hypothetical protein [Schistosoma japonicum]Isotig00006-Smp_scaff000414 620 294 gi|226482516|emb|CAX73857.1| Granulins precursor (Proepithelin)

[Schistosoma japonicum]Isotig00005-Smp_scaff000414 622 259 gi|226482516|emb|CAX73857.1| Granulins precursor (Proepithelin)

[Schistosoma japonicum]Isotig00004-Smp_scaff000414 744 244 gi|226482516|emb|CAX73857.1| Granulins precursor (Proepithelin)

[Schistosoma japonicum]Isotig00002-Smp_scaff000076 435 193 gi|226469752|emb|CAX76706.1| hypotheticial protein [Schistosoma japonicum]Isotig00001-Smp_scaff000076 433 192 gi|226469752|emb|CAX76706.1| hypotheticial protein [Schistosoma japonicum]Isotig00019-Smp_scaff000001 838 175 gi|76162597|gb|AAX30535.2| SJCHGC04609 protein [Schistosoma japonicum]Isotig00021-Smp_scaff000001 801 152 gi|76162597|gb|AAX30535.2| SJCHGC04609 protein [Schistosoma japonicum]Isotig00005-Smp_scaff000543 381 151 gi|171473974|gb|ACB47095.1| SJCHGC01950 protein [Schistosoma japonicum]

Dynein light chain LC6, flagellar outer arm [Schistosoma japonicum]Isotig00016-Smp_scaff000574 979 145 gi|29841116|gb|AAP06129.1| SJCHGC05452 protein [Schistosoma japonicum]

hypothetical protein [Schistosoma japonicum]Isotig00020-Smp_scaff000001 802 138 gi|76162597|gb|AAX30535.2| SJCHGC04609 protein [Schistosoma japonicum]Isotig00036-Smp_scaff000116 372 125 gi|226488971|emb|CAX74835.1| hypotheticial protein [Schistosoma japonicum]Isotig00020-Smp_scaff000406 600 122 gi|60692530|gb|AAX30632.1| SJCHGC06135 protein [Schistosoma japonicum]

Small nuclear ribonucleoprotein EIsotig00017-Smp_scaff000189 478 119 gi|226486688|emb|CAX74421.1| hypothetical protein [Schistosoma japonicum]Isotig00004-Smp_scaff000116 1433 117 gi|156553841|ref|XP_001600241.1| PREDICTED: similar to conserved hypothetical

protein [Nasonia vitripennis]Isotig00022-Smp_scaff000001 765 115 gi|76162597|gb|AAX30535.2| SJCHGC04609 protein [Schistosoma japonicum]Isotig00022-Smp_scaff000332 394 109 gi|257206088|emb|CAX82695.1| hypotheticial protein [Schistosoma japonicum]

Fig. 5. Novel evidence of transcriptional activity in intergenic regions of the S.mansoni genome. Distribution shows the number of RNA-seq EST contigs that mapto S. mansoni intergenic genomic regions devoid of previous gene predictions forwhich new evidence of transcriptional activity has been obtained by RNA-seq. Eachcolumn shows the number of RNA-seq intergenic contigs having the amount ofRNA-seq ESTs per contig that is indicated in the x-axis.

G.T. Almeida et al. / Experimental Parasitology xxx (2011) xxx–xxx 7

YEXPR 6251 No. of Pages 10, Model 5G

8 July 2011

sequences from organisms other than Schistosoma are possiblecontaminations and need to be manually curated.

We have assembled the 120,501 RNA-seq ESTs that did not mapto the S. mansoni genome and we have taken only the 308 contigs(assembled from 23,315 ESTs) that were P200 nt long and werenot contaminants, i.e., were not similar to genes from bacteria,hamster, mouse or human. Comparing these contig sequences withBlastX to the GenBank proteins database, we found a total of 44RNA-seq EST contigs that encode proteins similar to proteins pre-viously described in S. japonicum (Liu et al., 2006; Zhou et al.,2009); these are shown on Table 4. These contigs represent candi-date genes expressed in S. mansoni adult males that are not part ofthe current genome assembly.

630

631

632

633

634

635

3.9. Expression of mitochondrial genes

With RNA-seq we observed an enriched expression of mito-chondrial genes. Using the Genomic Workbench CLC tool

Please cite this article in press as: Almeida, G.T., et al. Exploring the Schistosomdoi:10.1016/j.exppara.2011.06.010

(www.clcbio.com) to align the RNA-seq EST sequences, we identi-fied a total of 290,527 RNA-seq ESTs (22.7%) that aligned withmtDNA. It is interesting to note that these reads entirely coveredthe previously published sequence and brought new informationpermitting to extend and circularize the S. mansoni mitochondrialDNA molecule (mtDNA). Supplementary document 2 providesthe full-length circular mitochondrial sequence. The new extendedregion is marked in yellow. For the publicly available NCBI dataset,there were 2349 aligned EST sequences (1.1%), which reflects thefact that dbEST is a curated set excluding mitochondrial geneexpression.

3.10. RNA-seq pointed out putative new micro-exon genes

Micro-exon genes (MEGs) are unique genomic structures thatwere described exclusively in schistosomes and display a codingregion composed of a number of very small exons (636 bp) in tan-dem, which are usually symmetrical (i.e., the number of bases isdivisible by 3). A number of these genes have been recently de-scribed in the S. mansoni genome (Berriman et al., 2009; DeMarcoet al., 2010), using the NCBI previously existing transcript informa-tion as a start point for construction of gene models. Consideringthe small size of such exons their detection by de novo gene predic-tion is virtually impossible. Therefore, the description of novelMEGs is dependent on the accumulation of transcript data.

We have performed a detailed analysis of the number and sizeof exons resulting from Blat alignment of the RNA-seq ESTs to theS. mansoni genome, searching for ESTs that were mapped withthree or more exons in tandem, with exon sizes smaller than37 bp, which could represent potential new MEGs. We were ableto detect RNA-seq ESTs for 18 out of the 31 previously describedMEGs with a complete annotation in the genome. The fact thatnot all MEG families were detected in the male transcriptomewas already expected since several of them had been identifiedby mapping ESTs from stages other than adult worms. In fact,

a mansoni adult male transcriptome using RNA-seq. Exp. Parasitol. (2011),

636

637

638

639

640

641

642

643

644

645

646

647

648

649

650

651

652

653

654

655

656

657

658

659

660

661

662

663

664

665

666

Table 4S. mansoni RNA-seq EST contigs that do not align with the parasite genome and encode proteins with similarity to S. japonicum proteins.

RNA-seq contig number Annotation accession Annotation of S. mansoni RNA-seq EST contigs that do not align with the parasite genome

Isotig00157 gi|257216408|emb|CAX82409.1| Aspartate–ammonia ligase [Schistosoma japonicum]Isotig00243 gi|226466806|emb|CAX69538.1| Casein kinase I isoform alpha [Schistosoma japonicum]Isotig00147 gi|226482293|emb|CAX73747.1| Cleft lip and palate transmembrane protein 1 homolog

[Schistosoma japonicum]Isotig00218 gi|226468516|emb|CAX69935.1| Coatomer subunit beta [Schistosoma japonicum]Isotig00349 gi|226482516|emb|CAX73857.1| Granulins precursor (Proepithelin) [Schistosoma japonicum]Isotig00211 gi|226490248|emb|CAX69366.1| Hypothetical protein [Schistosoma japonicum]Isotig00240 gi|226480602|emb|CAX73398.1| Hypothetical protein [Schistosoma japonicum]Isotig00272 gi|226485487|emb|CAX75163.1| Hypothetical protein [Schistosoma japonicum]Isotig00166 gi|226468590|emb|CAX69972.1| Hypothetical protein [Schistosoma japonicum]Isotig00337 gi|29841271|gb|AAP06303.1| Similar to XM_026994 hypothetical protein FLJ10743 in Homo sapiens

[Schistosoma japonicum]Isotig00151 gi|257205632|emb|CAX82467.1| Long-chain-fatty-acid – CoA ligase 5 [Schistosoma japonicum]Isotig00168 gi|226479000|emb|CAX72995.1| Poly (A) polymerase beta [Schistosoma japonicum]Isotig00288 gi|226482458|emb|CAX73828.1| Polypeptide GalNAc transferase 6 [Schistosoma japonicum]Isotig00301 gi|226484492|emb|CAX74155.1| Protein FAM82B [Schistosoma japonicum]Isotig00143 gi|257215734|emb|CAX83019.1| Protein VHS3 (Viable in a HAL3 SIT4 background protein 3)

[Schistosoma japonicum]Isotig00242 gi|226482626|emb|CAX73912.1| Putative E3 ubiquitin-protein ligase HERC5 [Schistosoma japonicum]Isotig00181 gi|226480652|emb|CAX73423.1| Putative X-prolyl aminopeptidase [Schistosoma japonicum]Isotig00148 gi|226480816|emb|CAX73505.1| RNA-binding region RNP-1 (RNA recognition motif),domain-containing protein

[Schistosoma japonicum]Isotig00141 gi|56759284|gb|AAW27782.1| SJCHGC00653 protein [Schistosoma japonicum]Isotig00230 gi|56758120|gb|AAW27200.1| SJCHGC01149 protein [Schistosoma japonicum]Isotig00311 gi|56752557|gb|AAW24492.1| SJCHGC01345 protein [Schistosoma japonicum]Isotig00365 gi|56756186|gb|AAW26268.1| SJCHGC01787 protein [Schistosoma japonicum]Isotig00262 gi|76154478|gb|AAX25955.2| SJCHGC03280 protein [Schistosoma japonicum]Isotig00317 gi|56753856|gb|AAW25125.1| SJCHGC03676 protein [Schistosoma japonicum]Isotig00231 gi|76155670|gb|AAX26957.2| SJCHGC04286 protein [Schistosoma japonicum]Isotig00325 gi|76154276|gb|AAX25764.2| SJCHGC04856 protein [Schistosoma japonicum]Isotig00105 gi|56756771|gb|AAW26557.1| SJCHGC05086 protein [Schistosoma japonicum]Isotig00106 gi|56756771|gb|AAW26557.1| SJCHGC05086 protein [Schistosoma japonicum]Isotig00322 gi|76153610|gb|AAX25227.2| SJCHGC05745 protein [Schistosoma japonicum]Isotig00314 gi|76153927|gb|AAX25489.2| SJCHGC06001 protein [Schistosoma japonicum]Isotig00225 gi|56753732|gb|AAW25063.1| SJCHGC06435 protein [Schistosoma japonicum]Isotig00320 gi|56756471|gb|AAW26408.1| SJCHGC06591 protein [Schistosoma japonicum] prefoldin subunit 4

[Schistosoma japonicum]Isotig00203 gi|56756869|gb|AAW26606.1| SJCHGC06728 protein [Schistosoma japonicum] Augmenter of liver regeneration

[Schistosoma japonicum]Isotig00182 gi|76153122|gb|AAX24783.2| SJCHGC06854 protein [Schistosoma japonicum]Isotig00232 gi|56757455|gb|AAW26895.1| SJCHGC07338 protein [Schistosoma japonicum] Mitochondrial 28S ribosomal protein S6

[Schistosoma japonicum]Isotig00264 gi|56758960|gb|AAW27620.1| SJCHGC07524 protein [Schistosoma japonicum]Isotig00177 gi|76157678|gb|AAX28536.2| SJCHGC07825 protein [Schistosoma japonicum]Isotig00189 gi|76153256|gb|AAX24902.2| SJCHGC08152 protein [Schistosoma japonicum]Isotig00274 gi|56756370|gb|AAW26358.1| SJCHGC08167 protein [Schistosoma japonicum]Isotig00194 gi|76154677|gb|AAX26115.2| SJCHGC08851 protein [Schistosoma japonicum]Isotig00299 gi|226482326|emb|CAX73762.1| Ubiquinol-cytochrome-c reductase complex core protein 2,

mitochondrial precursor [Schistosoma japonicum]Isotig00309 gi|226482326|emb|CAX73762.1| Ubiquinol-cytochrome-c reductase complex core protein 2,

mitochondrial precursor [Schistosoma japonicum]Isotig00180 gi|60600730|gb|AAX26822.1| Unknown [Schistosoma japonicum]Isotig00113 gi|257215734|emb|CAX83019.1| Protein VHS3 (Viable in a HAL3 SIT4 background protein 3)

[Schistosoma japonicum]

8 G.T. Almeida et al. / Experimental Parasitology xxx (2011) xxx–xxx

YEXPR 6251 No. of Pages 10, Model 5G

8 July 2011

RT-PCRs performed in a previous work confirmed that some MEGsare not expressed in adults (DeMarco et al., 2010).

In addition, we identified 11 novel MEGs, each containing 4 ormore micro-exons in tandem in their gene structures. The list ofRNA-seq reads that provided evidence for these 11 novel MEGs isin Supplementary Table 5. Two are new members of MEG-8 familyand we named them MEG-8.3 (see below) and MEG-8.4. Another isa new member of MEG-14 family and was named 14.2; a newmember of MEG-1 family was named MEG-1.2. Finally, sevennew families were identified and named MEG-19 to MEG-25. Sizeanalysis of the 78 micro-exons contained in these 11 new MEGsshowed a predominance of exon sizes that were multiple of 3 (Sup-plementary Fig. 1), indicating that the tendency for most of theseto be symmetric, which was previously described (Berrimanet al., 2009; DeMarco et al., 2010), persists in those novel micro-exons identified here.

Please cite this article in press as: Almeida, G.T., et al. Exploring the Schistosomdoi:10.1016/j.exppara.2011.06.010

One example of a novel MEG identified here was derived from aset of 15 reads that aligned between bases 75,736 and 86,863 onSmp_scaff000431genomic scaffold, which was previously devoidof any gene annotation. Eleven of the mapped reads allowed thedescription of a gene displaying 10 exons and, with the exceptionof the two flanking exons, all of the other eight exons display sizessmaller than 36 bp and are symmetrical (Fig. 6), which make thisgene a typical MEG. In addition, the protein encoded by this tran-script displays a secretion signal detected by SignalP program andis similar to MEG-8.1 protein, which leads us to name this newgene as MEG-8.3. Three other reads have indicated the productionof an alternatively spliced transcript, with the use of an alternativesplicing site that expanded a 15 nt exon to a size of 56, and anotherread suggests a transcript that is initiated at a different 50-end exon(Fig. 6).

a mansoni adult male transcriptome using RNA-seq. Exp. Parasitol. (2011),

667

668

669

670

671

672

673

674

675

676

677

678

679

680

681

682

683

684

685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702

703

704

705

706

707

708

709

710

711

712

713

714

715

716

717

718

719

720

721

722

723

724

725

726

727

728

729

730

731

732

733

734

735

736

737

738

739

740

741

742

743

744

745

746

747

748

749

750Q1

751

752753754755756757758759760761762763764765766767768

Fig. 6. Gene structure of MEG-8.3 micro-exon gene. Deduced structure for micro-exon gene (MEG) 8.3 is displayed in the figure, based on mapping RNA-seq ESTs to thegenome. White bar above represents the portion of the genomic scaffold in which the transcripts were mapped (scaffold coordinates are shown). Thick lines represent codingregion for exons, medium lines UTR regions and thin lines the introns. Exons are shown to scale, but for illustrative purposes intron lengths are not proportional to size.Numbers above exons indicate exon sizes (in nucleotides).

G.T. Almeida et al. / Experimental Parasitology xxx (2011) xxx–xxx 9

YEXPR 6251 No. of Pages 10, Model 5G

8 July 2011

4. Discussion

With next generation large-scale sequencing, millions of ESTscan be generated in a fast and affordable manner, an approach thatis currently known as RNA-seq; this approach is unique in provid-ing gene prediction confirmation and/or correction of in silico pre-dictions for which there were no previous evidence oftranscription, and in establishing new predictions. Here, we haveprovided examples of all these events.

It is noteworthy that out of the 2836 predicted genes without pre-vious EST evidence of expression in S. mansoni, a total of 990 (35%)were confirmed here for the first time as expressed in adult male par-asites. Among the top 20 most highly expressed genes in this cate-gory, 70% are annotated as hypothetical (Table 2). Inspection of thenumber of RNA-seq reads per gene among these top 20 genes showsthat they have a much lower expression (37–115 reads/gene) thanthe top 20 most abundant genes in male parasites (4581–31,570 reads/gene, see Supplementary Table 1). It is apparent thatRNA-seq is a sensitive method that is able to identify more genesin a given stage than in previous approaches. It is important to con-sider when studying any particular transcripts that had been previ-ously detected as specific to other life cycle stages, that they mayhave now been identified in males as well, however at lower levelsof expression. In the same way, genes that have been detected here,and are so far only described in male parasites, may well be less tran-scribed in other stages that had not been yet sampled by RNA-seq,thus not necessarily representing male specific genes.

It is interesting to note that 58 among the top 100 most highlyexpressed genes in S. mansoni male parasites are annotated ashypothetical (Supplementary Table 1), comprising 197,538 RNA-seq reads, which represent a total of 21% of the RNA-Seq reads thatmap to predicted genes (931,547 RNA-seq ESTs). We argue that thehighly expressed hypothetical genes with unknown function de-tected in both groups, i.e., genes with or without previous EST evi-dence of expression (see Table 2 and Supplementary Tables 1 and2), must play an important role in S. mansoni biology, and consti-tute good candidates for further functional characterization byRNAi partial knockdown.

Regarding correction of in silico predictions, the examplesshown here in Figs. 3 and 4 are typical of a frequent situationwhere the last exon has an incorrect in silico prediction. This errorfrequently precludes the use of predicted sequence information fordesigning a primer at the predicted 30-end, for amplification byPCR. It should be noted that all of our sequences have been assem-bled and deposited at the TSA division of GenBank, and they be-come part of the nt/nr database that is searchable with the Blasttool. We suggest that a full-length genomic sequence from a locusthat harbors a predicted gene of interest, and eventually additionalgenomic sequence within 5 kb in the vicinity of this gene locus, beused as queries against GenBank nt in order to retrieve all availableevidence of transcription for that locus obtained here from RNA-seq, in order to confirm and/or adjust the structure of a predictedgene of interest.

Please cite this article in press as: Almeida, G.T., et al. Exploring the Schistosomdoi:10.1016/j.exppara.2011.06.010

Finally, regarding novel gene predictions, we highlight the iden-tification of 16 novel micro-exon genes (MEGs), adding to the listof 31 previously identified MEGs (Berriman et al., 2009; DeMarcoet al., 2010). Micro-exons had been described in humans, Drosoph-ila melanogaster, Caenorhabditis elegans and Arabidopsis thaliana(Volfovsky et al., 2003), however in all these species only a singlemicro-exon was detected per each gene. Micro-exon genes com-prised of a number of micro-exons organized in tandem in thesame transcript have been only identified in S. mansoni so far(Berriman et al., 2009; DeMarco et al., 2010). MEG proteins havebeen confirmed by mass spectrometry analysis of S. mansoni secre-tions from migrating schistosomula and mature eggs (DeMarcoet al., 2010) and they may represent a molecular system for creat-ing protein variation through alternate splicing of the short sym-metric exons organized in tandem. Expression of MEGs might bepart of the immune evasion mechanisms by schistosomes andidentification of the entire set of MEGs in S. mansoni might be animportant step towards effectively intervening in the host–parasiterelationship.

Acknowledgments

This work was supported in part by a grant from Fundação deAmparo a Pesquisa do Estado de São Paulo (FAPESP) to S.V.A., a FA-PESP young researcher grant to R.D.M., a grant from Financiadorade Estudos e Projetos (FINEP) to R.D.M. and S.V.A., and a grant fromthe FP-7 European Community, SEtTReND Grant No. 241865 toS.V.A. and Rd.M., G.T.A., M.S.A., FCFB received fellowships from FA-PESP. S.V.A. received an established investigator fellowship fromConselho Nacional de Desenvolvimento Científico e Tecnologico,Brazil.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.exppara.2011.06.010.

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic localalignment search tool. Journal of Molecular Biology 215, 403–410.

Berriman, M., Haas, B.J., LoVerde, P.T., Wilson, R.A., Dillon, G.P., Cerqueira, G.C.,Mashiyama, S.T., Al-Lazikani, B., Andrade, L.F., Ashton, P.D., Aslett, M.A.,Bartholomeu, D.C., Blandin, G., Caffrey, C.R., Coghlan, A., Coulson, R., Day, T.A.,Delcher, A., DeMarco, R., Djikeng, A., Eyre, T., Gamble, J.A., Ghedin, E., Gu, Y.,Hertz-Fowler, C., Hirai, H., Hirai, Y., Houston, R., Ivens, A., Johnston, D.A.,Lacerda, D., Macedo, C.D., McVeigh, P., Ning, Z., Oliveira, G., Overington, J.P.,Parkhill, J., Pertea, M., Pierce, R.J., Protasio, A.V., Quail, M.A., Rajandream, M.A.,Rogers, J., Sajid, M., Salzberg, S.L., Stanke, M., Tivey, A.R., White, O., Williams,D.L., Wortman, J., Wu, W., Zamanian, M., Zerlotini, A., Fraser-Liggett, C.M.,Barrell, B.G., El-Sayed, N.M., 2009. The genome of the blood fluke Schistosomamansoni. Nature 460, 352–358.

DeMarco, R., Mathieson, W., Manuel, S.J., Dillon, G.P., Curwen, R.S., Ashton, P.D.,Ivens, A.C., Berriman, M., Verjovski-Almeida, S., Wilson, R.A., 2010. Proteinvariation in blood-dwelling schistosome worms generated by differentialsplicing of micro-exon gene transcripts. Genome Research 20, 1112–1121.

a mansoni adult male transcriptome using RNA-seq. Exp. Parasitol. (2011),

769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822

823824825826827828829830831832833834835836837838839840841842843Q2844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876

10 G.T. Almeida et al. / Experimental Parasitology xxx (2011) xxx–xxx

YEXPR 6251 No. of Pages 10, Model 5G

8 July 2011

Fenwick, A., Webster, J.P., Bosque-Oliva, E., Blair, L., Fleming, F.M., Zhang, Y., Garba,A., Stothard, J.R., Gabrielli, A.F., Clements, A.C., Kabatereine, N.B., Toure, S.,Dembele, R., Nyandindi, U., Mwansa, J., Koukounari, A., 2009. Theschistosomiasis control initiative (SCI): rationale, development andimplementation from 2002–2008. Parasitology 136, 1719–1730.

Fitzpatrick, J.M., Johnston, D.A., Williams, G.W., Williams, D.J., Freeman, T.C., Dunne,D.W., Hoffmann, K.F., 2005. An oligonucleotide microarray for transcriptomeanalysis of Schistosoma mansoni and its application/use to investigate gender-associated gene expression. Molecular and Biochemical Parasitology 141, 1–13.

Franco, G.R., Adams, M.D., Soares, M.B., Simpson, A.J., Venter, J.C., Pena, S.D., 1995.Identification of new Schistosoma mansoni genes by the EST strategy using adirectional cDNA library. Gene 152, 141–147.

Franco, G.R., Rabelo, E.M., Azevedo, V., Pena, H.B., Ortega, J.M., Santos, T.M., Meira,W.S., Rodrigues, N.A., Dias, C.M., Harrop, R., Wilson, A., Saber, M., Abdel-Hamid,H., Faria, M.S., Margutti, M.E., Parra, J.C., Pena, S.D., 1997. Evaluation of cDNAlibraries from different developmental stages of Schistosoma mansoni forproduction of expressed sequence tags (ESTs). DNA Research 4, 231–240.

Gubler, U., 1987. Second-strand cDNA synthesis: mRNA fragments as primers.Methods in Enzymology 152, 330–335.

Haas, B.J., Salzberg, S.L., Zhu, W., Pertea, M., Allen, J.E., Orvis, J., White, O., Buell, C.R.,Wortman, J.R., 2008. Automated eukaryotic gene structure annotation usingEVidenceModeler and the Program to Assemble Spliced Alignments. GenomeBiology 9, R7.

Hillyer, G.V., 1974. Buoyant density and thermal denaturation profiles ofschistosome DNA. The Journal of Parasitology 60, 725–727.

Hu, W., Yan, Q., Shen, D.K., Liu, F., Zhu, Z.D., Song, H.D., Xu, X.R., Wang, Z.J., Rong, Y.P.,Zeng, L.C., Wu, J., Zhang, X., Wang, J.J., Xu, X.N., Wang, S.Y., Fu, G., Zhang, X.L.,Wang, Z.Q., Brindley, P.J., McManus, D.P., Xue, C.L., Feng, Z., Chen, Z., Han, Z.G.,2003. Evolutionary and biomedical implications of a Schistosoma japonicumcomplementary DNA resource. Nature Genetics 35, 139–147.

Jolly, E.R., Chin, C.S., Miller, S., Bahgat, M.M., Lim, K.C., DeRisi, J., McKerrow, J.H.,2007. Gene expression patterns during adaptation of a helminth parasite todifferent environmental niches. Genome Biology 8, R65.

Kent, W.J., 2002. BLAT – the BLAST-like alignment tool. Genome Research 12, 656–664.

Kumar, S., Blaxter, M.L., 2010. Comparing de novo assemblers for 454 transcriptomedata. BMC Genomics 11, 571.

Le Paslier, M.C., Pierce, R.J., Merlin, F., Hirai, H., Wu, W., Williams, D.L., Johnston, D.,LoVerde, P.T., Le Paslier, D., 2000. Construction and characterization of aSchistosoma mansoni bacterial artificial chromosome library. Genomics 65, 87–94.

Liu, F., Lu, J., Hu, W., Wang, S.Y., Cui, S.J., Chi, M., Yan, Q., Wang, X.R., Song, H.D., Xu,X.N., Wang, J.J., Zhang, X.L., Zhang, X., Wang, Z.Q., Xue, C.L., Brindley, P.J.,McManus, D.P., Yang, P.Y., Feng, Z., Chen, Z., Han, Z.G., 2006. New perspectiveson host–parasite interplay by comparative transcriptomic and proteomicanalyses of Schistosoma japonicum. PLoS Pathogens 2, e29.

Maher, C.A., Kumar-Sinha, C., Cao, X., Kalyana-Sundaram, S., Han, B., Jing, X., Sam, L.,Barrette, T., Palanisamy, N., Chinnaiyan, A.M., 2009. Transcriptome sequencingto detect gene fusions in cancer. Nature 458, 97–101.

Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J.,Braverman, M.S., Chen, Y.J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V.,Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, M.L.,Jarvie, T.P., Jirage, K.B., Kim, J.B., Knight, J.R., Lanza, J.R., Leamon, J.H., Lefkowitz,S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B., McDade, K.E., McKenna,

877

Please cite this article in press as: Almeida, G.T., et al. Exploring the Schistosomdoi:10.1016/j.exppara.2011.06.010

M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T., Roth,G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz,A., Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P., Begley,R.F., Rothberg, J.M., 2005. Genome sequencing in microfabricated high-densitypicolitre reactors. Nature 437, 376–380.

Merrick, J.M., Osman, A., Tsai, J., Quackenbush, J., LoVerde, P.T., Lee, N.H., 2003. TheSchistosoma mansoni gene index: gene discovery and biology by reconstructionand analysis of expressed gene sequences. The Journal of Parasitology 89, 261–269.

Neto, E.D., Correa, R.G., Verjovski-Almeida, S., Briones, M.R.S., Nagai, M.A., da Silva,W., Zago, M.A., Bordin, S., Costa, F.F., Goldman, G.H., Carvalho, A.F., Matsukuma,A., Baia, G.S., Simpson, D.H., Brunstein, A., de Oliveira, P.S.L., Bucher, P.,Jongeneel, C.V., O’Hare, M.J., Soares, F., Brentani, R.R., Reis, L.F.L., de Souza, S.J.,Simpson, A.J.G., 2000. Shotgun sequencing of the human transcriptome withORF expressed sequence tags. Proceedings of the National Academy of Sciencesof the United States of America 97, 3491–3496.

Pellegrino, J., Brener, Z., 1956. Method for isolating schistosome granulomas frommouse liver. The Journal of Parasitology 42, 564.

Pennisi, E., 2011. Will computers crash genomics? Science 331, 666–668.Shaffer, C., 2007. Next-generation sequencing outpaces expectations. Nature

Biotechnology 25, 149.Simpson, A.J., Sher, A., McCutchan, T.F., 1982. The genome of Schistosoma mansoni:

isolation of DNA, its size, bases and repetitive sequences. Molecular andBiochemical Parasitology 6, 125–137.

Spiegelman, S., Burny, A., Das, M.R., Keydar, J., Schlom, J., Travnicek, M., Watson, K.,1970. DNA-directed DNA polymerase activity in oncogenic RNA viruses. Nature227, 1029–1031.

Verjovski-Almeida, S., DeMarco, R., Martins, E.A., Guimaraes, P.E., Ojopi, E.P.,Paquola, A.C., Piazza, J.P., Nishiyama Jr., M.Y., Kitajima, J.P., Adamson, R.E.,Ashton, P.D., Bonaldo, M.F., Coulson, P.S., Dillon, G.P., Farias, L.P., Gregorio, S.P.,Ho, P.L., Leite, R.A., Malaquias, L.C., Marques, R.C., Miyasato, P.A., Nascimento,A.L., Ohlweiler, F.P., Reis, E.M., Ribeiro, M.A., Sa, R.G., Stukart, G.C., Soares, M.B.,Gargioni, C., Kawano, T., Rodrigues, V., Madeira, A.M., Wilson, R.A., Menck, C.F.,Setubal, J.C., Leite, L.C., Dias-Neto, E., 2003. Transcriptome analysis of theacoelomate human parasite Schistosoma mansoni. Nature Genetics 35, 148–157.

Verjovski-Almeida, S., Leite, L.C.C., Dias-Neto, E., Menck, C.F.M., Wilson, R.A., 2004.Schistosome transcriptome: insights and perspectives for functional genomics.Trends in Parasitology 20, 304–308.

Volfovsky, N., Haas, B.J., Salzberg, S.L., 2003. Computational discovery of internalmicro-exons. Genome Research 13, 1216–1221.

WHO, 2002. Prevention and control of schistosomiasis and soil-transmittedhelminthiasis: report of a WHO Expert Committee. World HealthOrganization Technical Report Series 912, i–vi, 1–57, back cover.

Zhou, Y., Zheng, H.J., Chen, Y.Y., Zhang, L., Wang, K., Guo, J., Huang, Z., Zhang, B.,Huang, W., Jin, K., Dou, T.H., Hasegawa, M., Wang, L., Zhang, Y., Zhou, J., Tao, L.,Cao, Z.W., Li, Y.X., Vinar, T., Brejova, B., Brown, D., Li, M., Miller, D.J., Blair, D.,Zhong, Y., Chen, Z., Hu, W., Wang, Z.Q., Zhang, Q.H., Song, H.D., Chen, S.J., Xu,X.N., Xu, B., Ju, C., Huang, Y.C., Brindley, P.J., McManus, D.P., Feng, Z., Han, Z.G.,Lu, G., Ren, S.X., Wang, Y.Z., Gu, W.Y., Kang, H., Chen, J., Chen, X.Y., Chen, S.T.,Wang, L.J., Yan, J., Wang, B.Y., Lv, X.Y., Jin, L., Wang, B.F., Pu, S.Y., Zhang, X.L.,Zhang, W., Hu, Q.P., Zhu, G.F., Wang, J., Yu, J., Yang, H.M., Ning, Z.M., Beriman, M.,Wei, C.L., Ruan, Y.J., Zhao, G.P., Wang, S.Y., Liu, F., 2009. The Schistosomajaponicum genome reveals features of host–parasite interplay. Nature 460, 345–351.

a mansoni adult male transcriptome using RNA-seq. Exp. Parasitol. (2011),


Recommended