+ All Categories
Home > Documents > A distinct class of small RNAs arises from pre-miRNA–proximal regions in a simple chordate

A distinct class of small RNAs arises from pre-miRNA–proximal regions in a simple chordate

Date post: 22-Jan-2023
Category:
Upload: tongji
View: 0 times
Download: 0 times
Share this document with a friend
33
A distinct class of small RNAs arises from pre-miRNA– proximal regions in a simple chordate Weiyang Shi 1,2 , David Hendrix 1,2 , Mike Levine 1 & Benjamin Haley 1 MicroRNAs (miRNAs) have been implicated in various cellular processes. They are thought to function primarily as inhibitors of gene activity by attenuating translation or promoting mRNA degradation. A typical miRNA gene produces a predominant B21-nucleotide (nt) RNA (the miRNA) along with a less abundant miRNA* product. We sought to identify miRNAs from the simple chordate Ciona intestinalis through comprehensive sequencing of small RNA libraries created from different developmental stages. Unexpectedly, half of the identified miRNA loci encode up to four distinct, stable small RNAs. The additional RNAs, miRNA-offset RNAs (moRs), are generated from sequences immediately adjacent to the predicted B60-nt pre-miRNA. moRs seem to be produced by RNAse III–like processing, are B20 nt long and, like miRNAs, are observed at specific developmental stages. We present evidence suggesting that the biogenesis of moRs results from an intrinsic property of the miRNA processing machinery in C. intestinalis. miRNA genes have been observed across the Eukarya 1–5 . A typical miRNA arises from the processing of a larger primary transcript (pri-miRNA) that is synthesized by RNA polymerase II, as seen for protein-coding genes 6 . The pri-miRNA transcript forms one or multi- ple fixed hairpin structures that are liberated by the RNase III enzyme Drosha 7 . The resulting B70-nt hairpins (pre-miRNAs) are further processed by a separate RNAse III enzyme, Dicer, which produces stable, mature miRNAs of 20–22 nt in length 8–10 . Serial processing of pre-miRNAs is usually asymmetric, resulting in the production of a single, predominant miRNA arising from either the 5¢ or 3¢ arm of the pre-miRNA hairpin. In some cases, the opposite arm produces what is known as a miRNA* sequence that can reach appreciable steady-state levels but is less abundant than the miRNA 11 . The resulting miRNA and miRNA* can regulate distinct target mRNAs in a coordinated fashion 12 . It has been proposed that conserved miRNA gene families provide a distinctive evolutionary signature and that the miRNA repertoire expands along with animal complexity 13 . To better understand the evolutionary history of miRNA genes among the chordate lineages, we performed a high-resolution study of small RNAs from the ascidian Ciona intestinalis, which belongs to the sister group of the verte- brates 14 . In contrast to other well-studied model organisms, C. intestinalis possesses a uniquely simplified repertoire of small RNA cofactors, consisting of single copies of Drosha, Pasha, Dicer , TRBP/PACT and Argonaute, and just two PIWI homologs 11,14,15 . Here we report that numerous miRNA loci in C. intestinalis produce one or two discrete and stable B20-nt small RNA species from sequences immediately adjacent to the predicted pre-miRNA hairpins, in addition to conventional miRNA and miRNA* products. The biogenesis of these distinct RNAs is not explained by current models of miRNA processing. We present evidence that moRs are derived from an unanticipated activity of the C. intestinalis miRNA- biogenesis pathway. RESULTS Distinct small RNAs encoded by miRNA loci We prepared small RNA (B16–26-nt) libraries from C. intestinalis at various developmental stages, including unfertilized eggs, early embryos, late embryos and adults. High-throughput sequencing of the resulting cDNAs was performed with an Illumina 1G Genome Analyzer. Combining earlier studies with a recently described miRNA- discovery algorithm, we defined 80 miRNA loci in the C. intestinalis genome 16–18 . Detailed information regarding the encoded miRNAs and their potential target mRNAs is provided in Supplementary Tables 1–4 online and at the following website: http://flybuzz.berkeley. edu/cgi-bin/CionaMicroRNAs.cgi. Half of these genes encode a single major product (the miRNA), along with a less abundant miRNA* sequence, as is typically seen in other organisms 19,20 . For example, the C. intestinalis (Ci) miR-125 gene (ortholog of the prototypic lin-4 miRNA in Caenorhabditis elegans) encodes a predominant miRNA that is stably expressed at all developmental stages examined 21 (Supplementary Fig. 1 online). Ci-miR-125 is most highly expressed in adults, and at the adult stage a single clone of miR-125* is also detected. Unexpectedly, the remaining half of C. intestinalis miRNA loci encode previously uncharacterized small RNAs, in addition to Received 16 September 2008; accepted 21 November 2008; published online 18 January 2009; doi:10.1038/nsmb.1536 1 Department of Molecular Cell Biology, Division of Genetics, Genomics, and Development, Center for Integrative Genomics, University of California, Berkeley, California 94720-3200, USA. 2 These authors contributed equally to the work. Correspondence should be addressed to B.H. ([email protected]) or M.L. ([email protected]). NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 183 ARTICLES © 2009 Nature America, Inc. All rights reserved.
Transcript

A distinct class of small RNAs arises from pre-miRNA–proximal regions in a simple chordateWeiyang Shi1,2, David Hendrix1,2, Mike Levine1 & Benjamin Haley1

MicroRNAs (miRNAs) have been implicated in various cellular processes. They are thought to function primarily as inhibitorsof gene activity by attenuating translation or promoting mRNA degradation. A typical miRNA gene produces a predominantB21-nucleotide (nt) RNA (the miRNA) along with a less abundant miRNA* product. We sought to identify miRNAs from thesimple chordate Ciona intestinalis through comprehensive sequencing of small RNA libraries created from different developmentalstages. Unexpectedly, half of the identified miRNA loci encode up to four distinct, stable small RNAs. The additional RNAs,miRNA-offset RNAs (moRs), are generated from sequences immediately adjacent to the predicted B60-nt pre-miRNA. moRs seemto be produced by RNAse III–like processing, are B20 nt long and, like miRNAs, are observed at specific developmental stages.We present evidence suggesting that the biogenesis of moRs results from an intrinsic property of the miRNA processingmachinery in C. intestinalis.

miRNA genes have been observed across the Eukarya1–5. A typicalmiRNA arises from the processing of a larger primary transcript(pri-miRNA) that is synthesized by RNA polymerase II, as seen forprotein-coding genes6. The pri-miRNA transcript forms one or multi-ple fixed hairpin structures that are liberated by the RNase III enzymeDrosha7. The resulting B70-nt hairpins (pre-miRNAs) are furtherprocessed by a separate RNAse III enzyme, Dicer, which producesstable, mature miRNAs of 20–22 nt in length8–10.

Serial processing of pre-miRNAs is usually asymmetric, resulting inthe production of a single, predominant miRNA arising from eitherthe 5¢ or 3¢ arm of the pre-miRNA hairpin. In some cases, the oppositearm produces what is known as a miRNA* sequence that can reachappreciable steady-state levels but is less abundant than the miRNA11.The resulting miRNA and miRNA* can regulate distinct targetmRNAs in a coordinated fashion12.

It has been proposed that conserved miRNA gene families provide adistinctive evolutionary signature and that the miRNA repertoireexpands along with animal complexity13. To better understand theevolutionary history of miRNA genes among the chordate lineages, weperformed a high-resolution study of small RNAs from the ascidianCiona intestinalis, which belongs to the sister group of the verte-brates14. In contrast to other well-studied model organisms,C. intestinalis possesses a uniquely simplified repertoire of smallRNA cofactors, consisting of single copies of Drosha, Pasha, Dicer,TRBP/PACT and Argonaute, and just two PIWI homologs11,14,15.

Here we report that numerous miRNA loci in C. intestinalisproduce one or two discrete and stable B20-nt small RNA speciesfrom sequences immediately adjacent to the predicted pre-miRNA

hairpins, in addition to conventional miRNA and miRNA* products.The biogenesis of these distinct RNAs is not explained by currentmodels of miRNA processing. We present evidence that moRs arederived from an unanticipated activity of the C. intestinalis miRNA-biogenesis pathway.

RESULTSDistinct small RNAs encoded by miRNA lociWe prepared small RNA (B16–26-nt) libraries from C. intestinalis atvarious developmental stages, including unfertilized eggs, earlyembryos, late embryos and adults. High-throughput sequencing ofthe resulting cDNAs was performed with an Illumina 1G GenomeAnalyzer. Combining earlier studies with a recently described miRNA-discovery algorithm, we defined 80 miRNA loci in the C. intestinalisgenome16–18. Detailed information regarding the encoded miRNAsand their potential target mRNAs is provided in SupplementaryTables 1–4 online and at the following website: http://flybuzz.berkeley.edu/cgi-bin/CionaMicroRNAs.cgi.

Half of these genes encode a single major product (the miRNA),along with a less abundant miRNA* sequence, as is typically seen inother organisms19,20. For example, the C. intestinalis (Ci) miR-125gene (ortholog of the prototypic lin-4 miRNA in Caenorhabditiselegans) encodes a predominant miRNA that is stably expressed atall developmental stages examined21 (Supplementary Fig. 1 online).Ci-miR-125 is most highly expressed in adults, and at the adult stage asingle clone of miR-125* is also detected.

Unexpectedly, the remaining half of C. intestinalis miRNA lociencode previously uncharacterized small RNAs, in addition to

Received 16 September 2008; accepted 21 November 2008; published online 18 January 2009; doi:10.1038/nsmb.1536

1Department of Molecular Cell Biology, Division of Genetics, Genomics, and Development, Center for Integrative Genomics, University of California, Berkeley,California 94720-3200, USA. 2These authors contributed equally to the work. Correspondence should be addressed to B.H. ([email protected]) or M.L.([email protected]).

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 8 3

ART IC L E S

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Weiyang
Highlight

conventional miRNA and miRNA* products. This new class of RNAsarises from sequences located adjacent to the predicted pre-miRNAstem-loop, and we hereafter refer to them as ‘moRs’, for miRNA-offsetRNAs. Only small RNAs with 5¢ monophosphates and free 3¢ hydroxylgroups can be cloned by the method used in this study (see Methods),although they could contain modifications on the 2¢ oxygen, asseen for Piwi-interacting RNAs (piRNAs) and some miRNAspecies22. Most moR sequences are 19–20 ntin length, whereas C. intestinalis miRNAsrange in size between 19 nt and 22 nt(Supplementary Fig. 2a online). Overall,moRs are considerably less abundant thanmiRNAs, but just B50% less abundant thanmiRNA* sequences (1,552 total reads and3,353 total reads, respectively) (Supplemen-tary Table 4b). In general, moRs show

greater 5¢ heterogeneity than miRNA or miRNA* sequences (Supple-mentary Fig. 2b). However, several abundantly expressed moRs, suchas 5¢ moRs 124-1 and 219, contain a rigid 5¢-terminal nucleotideidentity and show developmental regulation, suggesting that particularmoRs may be under selective pressure, as has been suggested for the 5¢ends of miRNAs23.

It is possible that the C. intestinalis miRNA loci encoding moRscontain unique structural features, as compared to those that do not24.Global comparisons of base-pairing probabilities across the extendedpre-miRNA loci in C. intestinalis revealed only modest structuraldifferences between the two classes of miRNA loci (SupplementaryFig. 3 online). Overall, C. intestinalis miRNA loci maintain a similarbase-pairing probability trace as those seen in Drosophila melanogaster,suggesting that C. intestinalis miRNA genes lack an intrinsic, species-specific structure. Similarly, there is no obvious difference in the size ofthe loop sequences in pre-miRs that produce moRs and those that donot (B13 nt and B15 nt, respectively; Supplementary Fig. 4aonline). In addition, we analyzed sequence motifs for all smallRNAs cloned in this study. Whereas C. intestinalis miRNAs retainedthe expected 5¢-uracil bias, no obvious motifs were apparent in themoRs25 (Supplementary Fig. 4b). Thus, it is currently unclear whythey arise from particular miRNA loci.

Unfertilized egg

106

103

0

106

103

0

106

103

0

106

103

0

Early embryo

Late embryo

Adult((((((((((.....)))))..((((.(.(((.(((((.((((((.(.((((........)))).).)))))).)))))))).)......((((((....))))))..)))).

5′ moR-219 miR-219 miR-219*

Reads

Reads

Reads

Reads

a C. intestinalis miR-219 locus

b

9

39

193

81

620

5

24

36

miR-219 (715 reads)

miR-219* (60 reads)

5′-m

oR-219 (232 reads)

5′

3′

b c

a Unfertilized egg

Early embryo

Late embryo

Adult

C. intestinalis miR-124-1/2 locus

5′-moR

-124

-1

miR

-124

-1*

miR

-124

-1

5′-moR

-124

-2

miR

-124

-2*

miR

-124

-2

3′-moR

-124

-2

3′-moR

-124

-1

Reads

Reads

Reads

Reads

33 33

159

172

124

249

1,763

6,382

33

2

1,760

6,376

33

6

2177

320

25

31

106

103

0106

103

0106

103

0106

103

0

mir-124-2 (8,202 reads)

miR-124-2* (497 reads)5′-moR-124-2 (56 reads)

3′-moR-124-2 (8 reads)

miR-124-1 (8,211 reads)

miR-124-1* (373 reads)

5′-moR-124-1 (331 reads)

3′-moR-124-1 (2 reads)Canonical class IIIRNAse III product

~19-bp core

~2-nt 3′ overhang

Ci-miR-124-1

Ci-moR-124-1

5′3′

5′3′

5′3′

5′3′

5′

3′

5′

3′

5′3′

5′3′

Figure 1 Developmental expression of small RNAs encoded by the

C. intestinalis miR-219 locus. (a) Graphical depiction of small RNAs that

map to the miR-219 locus at four developmental time points, indicated to

the right. The histograms represent overlapping Illumina sequencing reads

(numbered above stack) centered at each position (miRNA, blue; miRNA*,

burgundy; 5¢-moR, yellow). The y axis is plotted on a log scale. The

secondary structure of the locus is presented in parenthetical format.

(b) Locations of miRNA, miRNA* and moR sequences on the predicted

secondary structure surrounding the pre–miR-219 hairpin. mFold was

used to predict pre-miRNA secondary structure here and in the

following figures45,46.

Figure 2 Coincident expression of 5¢ and 3¢ moR

sequences from the C. intestinalis miR-124

locus. (a) Sequencing reads at each position of

the miR-124 cluster are shown (miRNA, blue;

miRNA*, burgundy; 5¢-moR, yellow; 3¢-miRNA,

green). (b) miRNA and moR sequences aligned

with sequence surrounding the predicted pre–

miR-124-1 and pre–miR-124-2 stem-loop

structures. A red ‘C’ in the pre–miR-124-1

structure indicates a shared base between

multiple 5¢-moR and miR-124-1* clones.

(c) Standard class III RNAse III product is shown

(above), depicting an B19-nt core of matched

RNA bases, along with an B2-nt 3¢ overhang.

Aligned sequences are shown in the context of

the predicted secondary structure of the pri-miRNA for miR-124-1 (top) and miR-124-1*

(bottom), as well as 5¢-moR-124-1 (bottom) and

3¢-moR-124-1 (top). A shared base between loci

is marked as a red ‘‘C’’.

ART IC L E S

18 4 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Defining characteristics of moRsThe C. intestinalis miR-219 gene (Ci-miR-219) encodes a predicted57-nt pre-miRNA hairpin that is processed to produce miR-219 andmiR-219*. In addition, a 5¢ moR product (5¢-moR-219) arises fromsequences located immediately adjacent to miR-219 (Fig. 1a,b). Thepredominant miR-219 and miR-219* sequences are each B21 nt inlength, whereas the 5¢-moR-219 sequence is 20 nt (Fig. 1b).

Like most miRNAs, nearly all 5¢-moR-219 clones maintain aninvariant 5¢ end (223 of 232 total reads)23. Each of the three smallRNAs observed at the miR-219 locus showed developmental regula-tion (Fig. 1a). Only miR-219 was detected in unfertilized eggs andadults (Fig. 1a), whereas both miR-219* and 5¢-moR-219 were seenduring embryogenesis.

In some cases, two distinct moRs are produced from a singlemiRNA gene, in addition to miRNA and miRNA* sequences(Fig. 2). The Ci-miR-124 locus encodes a pri-miRNA containingtwo tandem, but slightly different, B58-nt pre-miRNAs (Fig. 2b).The resulting miRNAs, miR-124-1 and miR-124-2, are identical, andthe sequence shows peak expression, as evidenced by increased readcounts (see miR-133 example below), in advanced-stage embryos(Fig. 2a). Both pre-miRNAs produce 5¢ and 3¢ moRs during embry-ogenesis (Fig. 2a). We observed the 3¢ moR from the pre–miR-124-2hairpin in both early embryos and late embryos, but the 3¢ moR fromthe pre–miR-124-1 hairpin was detected only in early embryos.Moreover, 5¢-moR-124 RNAs are considerably more abundant thanthe 3¢-moR-124 RNAs, a result that is typical of the moRs andreminiscent of the processing of miRNA and miRNA* sequences, aswell as processing of pri-miRNA 5¢ and 3¢ arms by Drosha26–28.

Notably, alignment of coincident 5¢ and 3¢ moR sequences fromnumerous miRNA loci suggests that they arise from RNAse IIIprocessing (B21-nt duplexed RNAs with B2-nt 3¢ overhangs)29

(Fig. 2c and Supplementary Fig. 5 online).Despite the high prevalence of moRs associated with miRNA loci in

C. intestinalis, we found that, overall, moR sequences are poorlyconserved as compared to miRNAs between C. intestinalis and arelated ascidian species, Ciona savignyi, and moRs are even lessconserved than miRNA* sequences (Supplementary Fig. 4c). How-ever, it has been noted that well-conserved small RNAs are expressedat higher levels than those lacking conservation19,30. This is true formost miRNAs when comparing C. intestinalis to C. savignyi (Supple-mentary Fig. 4c). Similarly, abundant moRs are also better conservedthan those found at low copy number. Nonetheless, the general lack ofconservation raises the possibility that moRs may represent unstableprocessing intermediates during the biogenesis of miRNAs. Suchintermediates might be produced through a generic RNA-degradationmechanism that leaves behind spurious and variably sized small RNAs.However, as with miRNAs, the high copy number and near uniformityof clones at each locus suggests that moRs are produced mainly asB20-nt RNAs. To further address this point, we used northern assaysto directly examine the expression and size distribution of miRNA andmoRs in C. intestinalis embryos (see below).

Direct detection of moRs as discrete small RNAsVertebrate miR-133 genes are often part of a bicistronic pri-miRNAthat also contains miR-1, and the two miRNAs work together topromote mesodermal fates31. A similar genomic linkage is seen in

Unfertilized egg

Early embryo

Late embryo

Adult

Reads

Reads

Reads

Reads

.....(((((..((((((((.((((.((((((.(((.((.((((((((....((.......))..)))))))).)))))..)))))).)))).)))).)))).....(((.....

5′ moR miR-133* Loop miR-133

a

b

M

30 —25 —

17 —

30 —25 —

17 —

Un EE LE Ad

miR-133

5′-moR-133

U6 RNA

cC. intestinalis miR-133 locus

5

104

698

195

104

89

5

2

55

132

1,978

12

6

2

30 —

25 —

17 —

30 —

25 —

17 —

M

dWT Bra

miR-133

5′-moR-133

U6 RNA

106

103

0

106

103

0

106

103

0

106

103

0

5′3′

miR-133* (198 reads)

miR-133 (2,167 reads)

5′-moR-133 (1,002 reads)

Loop (20 reads)

Figure 3 Direct detection of the 5¢-moR-133 species. (a) Overlapping sequencing reads at each position along the miR-133 locus (miRNA, blue; miRNA*,

burgundy; loop, gray; 5¢-moR, yellow). (b) Alignment of sequenced reads on the predicted structure surrounding pre–miR-133. (c) Total RNA (B30 mg per

lane) was used for northern blots showing the B21-nt miR-133 (above) and 5¢-moR-133 (middle) species throughout C. intestinalis development (M, size

markers; Un, unfertilized eggs; EE, early embryos; LE, late embryos; Ad, adult animals). A northern blot for U6 RNA was used as a loading control (below).(d) As in c, comparing tailbud-stage C. intestinalis embryos that are unelectroporated (wild type, WT) or electroporated with a Ci-Brachyury enhancer:minimal

Ci-miR-133 transgene (Bra). The Ci-Brachyury enhancer drives expression in the developing notochord33.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 8 5

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

C. intestinalis, and previous studies have shown that the primarytranscript containing miR-1 and miR-133 is selectively expressed indeveloping tail muscles during C. intestinalis embryogenesis32. TheC. intestinalis miR-133 locus encodes separate miRNA, miRNA* and5¢ moR products (Fig. 3). miR-133 reads steadily increase duringembryogenesis and reach peak levels in adults (Fig. 3a). We found thatthe 5¢-moR-133 RNA is most abundant in late embryos and is presentat an equal or higher read count than miR-133 and miR-133* at allembryonic stages examined.

The levels of miR-133 and 5¢-moR-133 detected in northern assaysare in agreement with the sequencing frequencies obtained from thecDNA libraries (Fig. 3c). There is a progressive increase in the steady-state levels of miR-133 in unfertilized eggs, early embryos, late-stageembryos and adults (Fig. 3c, above). Similarly, the predicted 5¢-moR-133 RNA was detected as a stable product (appearing as a doublet ofB19–20-nt species in adults), with peak levels seen in late embryos.There was no indication of a smear or ‘ladder’ of higher- or lower-molecular-weight products, as would be expected if moRs representedincompletely degraded hairpin sequences or cleaved pri-miRNAtranscripts. Moreover, ectopic expression of Ci-miR-133 directed bya Ci-Brachyury enhancer in the developing C. intestinalis notochord—the primitive chordate backbone—resulted in increased accumulationof both 5¢-moR-133 and miR-133, indicating that expression of adiscrete moR is correlated with that of the host miRNA transcript33

(Fig. 3d).

Drosophila pri-miRNAs produce moRs in the Ciona tadpoleThe preceding analysis suggests that moRs arise from an intrinsicproperty of the C. intestinalis small RNA–biogenesis machinery (seeDiscussion). To test this possibility, the miR-309 miRNA cluster (alsoknown as ‘8-miR’) from D. melanogaster was selectively expressed inC. intestinalis34,35 (Fig. 4). We reasoned that the pri–miR-309 tran-script would be more likely to produce detectable moRs whenexpressed in the C. intestinalis tadpole because it seems to producesuch products, albeit rarely, in D. melanogaster (Fig. 4a).

We separately placed the entire miR-309 cluster under the control ofthree different tissue-specific enhancers from C. intestinalis thatdirect expression in the notochord, epidermis and mesenchyme,respectively33,36. All three transgenes were coelectroporated into ferti-lized eggs, and the embryos were allowed to develop to the tailbudstage (after neurogenesis). Total RNA was extracted from theseembryos and subjected to high-throughput sequencing or used fornorthern assays.Drosophila melanogaster moRs are produced at high steady-state

levels in C. intestinalis, and here we focused on the miR-3 and miR-5genes within the miR-309 cluster. We detected only four 3¢-moR-3RNA reads in the D. melanogster embryo, whereas in C. intestinalis weobserved nearly 2,000 copies (Fig. 4a,b). There is also a markedincrease in the levels of the 5¢-moR-5 RNA produced in C. intestinalisas compared with those in D. melanogaster. Nearly all copies ofthis moR RNA contain homogenous 5¢ and 3¢ termini (1,616 of

a

b

miR-6-3miR-6-2miR-6-1miR-5miR-4miR-3miR-309 miR-286

D. melanogaster

C. intestinalis

Reads

Reads

D. melanogaster miR-309 cluster

361,

619

2,19

76,

781

1210

1

1,84

58,

335

47,

557

67411

5

M

30 —

25 —

17 —

30 —

25 —

17 —

Ci Ci + m

iR-3

09

Dm

miR

-35′-m

oR-3

U6

c

106

103

0

106

103

0

D. melanogaster

C. intestinalis

D. melanogaster

C. intestinalis

4 (5)

11 (11)

41 (67) 5,854 (7,557)

8,167 (8,335)4 (4)

3 (4)

1,843 (1,845)

16 (36)

1,616 (1,619)

3,351 (6,781)

2,190 (2,197)

44 (101)

6 (12)

miR-3

miR-5

5′-moR-3 3′-moR-3miR-3miR-3*

5′-moR-5 miR-5 miR-5*

Figure 4 Ectopic expression of Drosophila pri-miRNAs can induce moR production in C. intestinalis embryos. (a) Small RNAs were cloned from 2–4-hour-old

D. melanogaster Toll10b mutant embryos (above), which contain only mesodermal cell types, or tailbud-stage C. intestinalis embryos expressing the entire

D. melanogaster pri–miR-309 cluster (below), and were subjected to Illumina sequencing. The resulting sequencing reads are shown at each position along

the D. melanogaster miR-309 locus (miRNA, blue; miRNA*, burgundy; 5¢-moR, yellow; intervening loop, gray). (b) The most abundant reads overlappingthe respective regions of the miR-3 (above) or miR-5 (below) loci are shown. The number of clones matching the exact sequence depicted is shown in

comparison to the overall number of clones overlapping that segment (in parentheses). (c) Northern blots showing miR-3 (above) and 5¢-moR-3 (middle)

in C. intestinalis and D. melanogaster embryos. For each well, B50 mg total RNA was analyzed from tailbud-stage C. intestinalis embryo that were

unelectroporated (Ci), similarly staged C. intestinalis embryos electroporated with D. melanogaster miR-309 expression plasmids (Ci + miR-309),

or 2–4-hour-old Toll10b embryos. Below is shown a northern blot in which a cross-reactive probe for U6 RNA was used as a loading control.

ART IC L E S

18 6 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

1,629 cloned copies are identical; Fig. 4b). In contrast, miR-3 wascloned at high frequency in D. melanogaster and C. intestinalis. Usingnorthern assays, we identified similar levels of miR-3 in C. intestinalisand D. melanogaster embryos, a result that is consistent with thesimilar number of reads detected by sequence analysis. However, usinga specific 5¢-moR-3 hybridization probe, we detected a discrete band,without any obvious intermediate products, only in C. intestinalisembryos ectopically expressing the miR-309 cluster (Fig. 4c).

There is no obvious correlation between the efficiency of moRbiogenesis and the size of the loop sequence in the pre-miRNAs orconservation of other features. For example, the pre-miRNAs encod-ing miR-3 and miR-5 contain loops of 13 nt and 18 nt, respectively,but nonetheless produce similar yields of moRs. These experimentsclearly demonstrate that the stable expression of moRs is an intrinsicfeature of the C. intestinalis small RNA–processing machinery.

DISCUSSIONWe have presented a high-resolution analysis of small RNAs duringthe development of the simple chordate, C. intestinalis. In the courseof documenting 80 C. intestinalis miRNA genes, a distinct species ofsmall RNAs was found to arise from sequences immediately 5¢ and 3¢of the expected miRNA and miRNA* products. We have termed thesesmall RNAs moRs (miRNA-offset RNAs).

moRs arise from B50% of the detected miRNA loci inC. intestinalis. However, there is no obvious sequence or structural dif-ference between those miRNA loci that produce moRs and those thatdo not. This observation raises the possibility that moRs might reflect

an intrinsic property of the small RNA–biogenesis machinery in C. intestinalis (seebelow). It is currently unclear why thismachinery fails to produce moRs from halfof the C. intestinalis miRNA genes and whythere is differential accumulation of individualmoRs during C. intestinalis development.

Putative moR products are seen inD. melanogaster and mouse embryonic stemcells, although they are extremely rare19,37.It was suggested that they might arise asby-products from exonuclease digestion ofpri-miRNAs. According to this view, thepre-miRNA stem-loop would be excisedfrom the pri-miRNA by Drosha, followedby decapping and 5¢-3¢ degradation, leavingbehind fortuitously cloned B21-mersnear the base of the pre-miRNA (summarizedin Fig. 5). We have presented evidencesuggesting that this mechanism prob-ably does not apply to the biogenesis ofC. intestinalis moRs. These products are farmore abundant in C. intestinalis as comparedwith D. melanogaster and mouse. Moreover,the most abundant moRs contain homo-genous 5¢ and 3¢ termini, and northern assaysdid not detect intermediate cleavage products(a smear or ladder), as would be expectedfrom such processive degradation (Figs. 3c,dand 4c).

In C. intestinalis, distinct 5¢ and 3¢ moRsarise from sequences located between thebicistronic Ci-miR-124-1/2 pre-miRNAs andfrom an ectopically expressed D. melanogaster

pri-miRNA cluster. It is difficult to reconcile the proposed exonucleo-lytic degradation model with the occurrence of such moRs, becausethis intervening region should be equally accessible to 5¢-3¢ and 3¢-5¢exonucleases38,39. Once again, such processing would be expected toproduce a range of small RNAs rather than the discrete products thatare actually observed.

Altogether, the simplest explanation for the biogenesis of moRs isthat they arise during Drosha processing of the pri-miRNA transcript.Drosha is a class II RNAse III enzyme containing two tandemRNAse III domains28,40. Following intramolecular dimerization ofthese domains, the enzyme cleaves the pri-miRNA substrate at asingle site (two total phosphodiester bonds), releasing a 5¢ and a3¢ product in addition to the pre-miRNA. Analysis of coincident 5¢and 3¢ moRs from numerous miRNA loci (such as those arising nearmiR-124-1) suggests that they may be paired in a manner similar toproducts generated through a bona fide RNAse III–like mechanism.That is, the duplexed RNAs contain B2-nt 3¢ overhangs, as seen forDicer products29.

For a lone C. intestinalis Drosha molecule to produce moRs, thesingle processing center must cut in a processive fashion at two sitesalong the pri-miRNA substrate, which is inconsistent with the pre-vailing model for Drosha activity28. Interactions among Droshamolecules could reconcile this apparent discrepancy. Such a mechan-ism is suggested by the recent demonstration of multimerized humanDrosha complexes28. Notably, mouse embryonic stem cells lackingDicer show enriched levels of moR-like sequences, which are lost upondisruption of Drosha activity37.

7m GpppG

AAA...

XXX

7mGpppG

7mGpppG

AAA...

XX

XX

AAA...

XX

XX

7m Gpp

pG

AA

A

7m Gpp

pG

AA

A

7mG

pppG

AA

A

a

b

c

Drosha cleavage

Drosha cleavage

Drosha multimercleavage

Figure 5 A speculative model for the biogenesis of moRs. (a) Previous analysis of D. melanogaster

and mouse small RNAs suggested that pre-miRNA–proximal sequences (analogous to moRs) were

by-products of exonucleolytic degradation following excision of the pre-miRNA hairpin by Drosha

(Drosha is represented in blue and yellow crosses indicate where Drosha cuts). (b) moR production

may result via excision of an B20-nt, imperfectly paired duplex RNA at the immediate base of the

pre-miRNA stem-loop, following two concurrent or sequential cuts by a single Drosha molecule.

(c) Alternatively, a multimeric complex containing at least two Drosha molecules could associate witha substrate pri-miRNA. Here each Drosha molecule would cleave the pri-miRNA at a distinct position,

liberating the pre-miRNA, as well as the B20-nt moR duplex.

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 8 7

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

It is possible that Drosha produces ‘double cuts’ in most or allorganisms, not just C. intestinalis. However, the resulting moR RNAsmay be subject to rapid degradation by an unknown pathway. Cionaintestinalis might have a modified version of this degradation pathwayto produce high steady-state levels of moRs. Future studies willexplore the mechanistic details of moR biogenesis and function inC. intestinalis development.

METHODSSmall RNA cloning and detection. We collected adult C. intestinalis animals

from Half Moon Bay, California, and maintained them in an artificial seawater

tank. We carried out fertilization, dechorionation and electroporations as

previously described33. Total RNA was extracted from unfertilized eggs, cleavage

stage, tadpole-stage embryos and adults using the miRVana miRNA Isolation

Kit (Ambion). Small RNA cloning was carried out as previously described41.

Basically, from B30 mg of total RNA, only 17–25-nt RNAs were size selected via

15% denaturing PAGE. The 3¢ ‘modban-1’ adaptor (IDT) was ligated to the

RNAs from this fraction with RNA ligase (Ambion) in ATP-free reaction

buffer41, and appropriately ligated RNAs were size selected via 15% denaturing

PAGE. The modified RNAs were subsequently ligated to a 5¢ linker (Solexa

linker) in the presence of RNA ligase and in reaction buffer with ATP. The

resulting RNA library was reverse transcribed to a cDNA library with Super-

Script II (Invitrogen). cDNA was amplified using Illumina sequencing–

specific primers, and the resulting libraries were sequenced on an Illumina

1G Genome Analyzer. In parallel, small RNAs were extracted using TRIZOL

(Invitrogen), cloned and sequenced, as above, from staged, 2–4-hour-old

D. melanogaster Toll10b embryos34. Northern blotting assays were performed

as described previously42.

We cloned the D. melanogaster miR-309 cluster by amplifying the locus from

yw genomic DNA using pfuUltra High Fidelity polymerase (Stratagene) and the

TOPO TA cloning system (Invitrogen).

Ci-Brachyury, Ci-FoxF and Ci-Twist enhancers were used to drive transgene

expression in the C. intestinalis notochord, epidermis and mesenchyme,

respectively33,36.

Primers used for amplification of the Ci-Twist enhancer were Ci-Twist-F

(forward), 5¢-ACCACAGCTTCTATTATATA-3¢, and Ci-Twist-R (reverse), 5¢-CATCGTGTGTTGATTGATTT-3¢.

Probe sequences for the Ci-miR-133 northern assay were Ci-miR-133 (5¢-CAGCTGGTTGAAGGGGACCAAA-3¢), Ci-5¢-moR-133 (5¢-GACCGACACC

CGCAATGTTT-3¢) and Ci-U6 (5¢-GTCATCCTTGCGCAGGGGCCATGCTA

ATCTTCTCTGTATCGTTCC-3¢).

The C. intestinalis miR-133 amplification primers were Ci-miR-133-F (for-

ward), 5¢-CGTTTTATACGGTTATATACAGG-3¢, and Ci-miR-133-R (reverse),

5¢-TATTTCCGACTACTGAGCG-3¢.The Drosophila miR-309 cluster amplification primers were Dme-8miR-F

(forward), 5¢-TGCAGACAAATGACGAATTGA-3¢, and Dme-8miR-R (reverse),

5¢-CCGACCCTTTCAGGTAACAA-3¢.The probe sequences for the Drosophila miR-3 northern assay were Dme-

miR-3, 5¢-TGAGACACACTTTGCCCAGTGAT-3¢ and Dme-5¢-moR-3, 5¢-CAG

GATCGGGACCTTAGGTG-3¢.

Data analysis. The standard Illumina pipeline (GAPipeline-0.3.0) was used

to extract sequenced reads. Nucleotide positions 1 to 26 were aligned to the

C. intestinalis (JGI version 1.0) or D. melanogaster (version 4.3) genomes using

ELAND, and for the calculation of position-specific error rates18,43. Supple-

mentary Figure 6a online shows the average error rate, defined as the estimated

probability of a base call being incorrect as a function of nucleotide position for

each of the four lanes (libraries) studied. The error rate model for the Illumina

pipeline was calibrated on the basis of uniquely aligned reads to the genome,

and then applied to all reads. The average error rate (averaged over all reads)

rises sharply beyond the twenty-first base, consistent with an assumption that

the reads should be dominated by miRNA sequences of roughly 21 nt, as

subsequent unaligned bases of the 3¢ adaptor would be scored as low quality.

Reads were trimmed so as to optimize the total nucleotide quality in a

dynamic programming approach that produced trimmed reads such that the

maximum acceptable error rate over the trimmed sequence is less than 10%

(QPHRED ¼ 10), the total quality of the read is optimized globally over all start

and stop positions, and the resulting length is greater than or equal to 17 nt44.

The trimming procedure can be described formally as follows. An optimal

trimming can be achieved by defining a penalty P associated with making an

incorrect base call at a given nucleotide n. Using the position-specific error

probability, en, one can define an expected score for a given nucleotide as

sn ¼ 1 � ð1 � enÞ+P � en ¼ 1 � ðP + 1Þ � en. The total expected score for a

trimming of the nucleotide sequence to start at position and end at position

j is then given by:

Sði; jÞ ¼Xj

n¼i

sn ¼Xj

n¼i

1 � ðP+1Þen:

One is then free to choose the penalty, such that the expected score is zero when

the error rate is the maximum tolerated, so P ¼ 1emax

� 1, and any error rate

greater than emax will produce a negative contribution to the score. A dynamic

programming search then globally optimizes Sði; jÞ over all start and stop

positions44. Further details of the data analysis rationale and methodology are

available in the Supplementary Methods online. A meta-analysis of the

distribution for all processed reads across a miRNA locus is presented in

Supplementary Figure 7 online.

Accession codes. Gene Expression Omnibus: Small RNA sequencing data have

been deposited with accession code GSE13625.

Note: Supplementary information is available on the Nature Structural & MolecularBiology website.

ACKNOWLEDGMENTSWe thank L. Tonkin of the Vincent J. Coates Genomics Sequencing Laboratoryfor assistance with high-throughput sequencing and general expertise,H. Melichar for critical reading of the manuscript and members of the Levinelaboratory for discussions. B.H. is supported by an American Cancer SocietyPostdoctoral Fellowship. This work was funded by a grant from the US NationalInstitutes of Health (34431) to M.L.

AUTHOR CONTRIBUTIONSW.S. and B.H. performed all experiments on C. intestinalis and D. melanogaster,respectively; D.H. performed bioinformatic analyses; M.L. and B.H. supervised thestudy and wrote the first draft of the manuscript; all authors discussed the resultsand commented on the manuscript.

Published online at http://www.nature.com/nsmb/

Reprints and permissions information is available online at http://npg.nature.com/

reprintsandpermissions/

1. Ambros, V. The functions of animal microRNAs. Nature 431, 350–355 (2004).2. Zamore, P.D. & Haley, B. Ribo-gnome: the big world of small RNAs. Science 309,

1519–1524 (2005).3. Bartel, D.P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116,

281–297 (2004).4. Lau, N.C., Lim, L.P., Weinstein, E.G. & Bartel, D.P. An abundant class of tiny RNAs

with probable regulatory roles in Caenorhabditis elegans. Science 294, 858–862(2001).

5. Pasquinelli, A.E. et al. Conservation of the sequence and temporal expression of let-7heterochronic regulatory RNA. Nature 408, 86–89 (2000).

6. Kim, V.N. MicroRNA biogenesis: coordinated cropping and dicing. Nat. Rev. Mol. CellBiol. 6, 376–385 (2005).

7. Lee, Y. et al. The nuclear RNase III Drosha initiates microRNA processing. Nature 425,415–419 (2003).

8. Bernstein, E., Caudy, A.A., Hammond, S.M. & Hannon, G.J. Role for a bidentateribonuclease in the initiation step of RNA interference. Nature 409, 363–366 (2001).

9. Grishok, A. et al. Genes and mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control C. elegans developmental timing.Cell 106, 23–34 (2001).

10. Hutvagner, G. et al. A cellular function for the RNA-interference enzyme Dicer in thematuration of the let-7 small temporal RNA. Science 293, 834–838 (2001).

11. Tomari, Y. & Zamore, P.D. Perspective: machines for RNAi. Genes Dev. 19, 517–529(2005).

12. Okamura, K. et al. The regulatory activity of microRNA* species has substantialinfluence on microRNA and 3¢ UTR evolution. Nat. Struct. Mol. Biol. 15, 354–363(2008).

13. Heimberg, A.M., Sempere, L.F., Moy, V.N., Donoghue, P.C. & Peterson, K.J. MicroRNAsand the advent of vertebrate morphological complexity. Proc. Natl. Acad. Sci. USA105, 2946–2950 (2008).

ART IC L E S

18 8 VOLUME 16 NUMBER 2 FEBRUARY 2009 NATURE STRUCTURAL & MOLECULAR BIOLOGY

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

14. Dehal, P. et al. The draft genome of Ciona intestinalis: insights into chordate andvertebrate origins. Science 298, 2157–2167 (2002).

15. Murphy, D., Dancis, B. & Brown, J.R. The evolution of core proteins involved inmicroRNA biogenesis. BMC Evol. Biol. 8, 92 (2008).

16. Friedlander, M.R. et al. Discovering microRNAs from deep sequencing data usingmiRDeep. Nat. Biotechnol. 26, 407–415 (2008).

17. Fu, X., Adamski, M. & Thompson, E.M. Altered miRNA repertoire in the simplifiedchordate, Oikopleura dioica. Mol. Biol. Evol. 25, 1067–1080 (2008).

18. Prochnik, S.E., Rokhsar, D.S. & Aboobaker, A.A. Evidence for a microRNA expansion inthe bilaterian ancestor. Dev. Genes Evol. 217, 73–77 (2007).

19. Ruby, J.G. et al. Evolution, biogenesis, expression, and target predictions of asubstantially expanded set of Drosophila microRNAs. Genome Res. 17, 1850–1864(2007).

20. Stark, A. et al. Systematic discovery and characterization of fly microRNAs using 12Drosophila genomes. Genome Res. 17, 1865–1879 (2007).

21. Slack, F. & Ruvkun, G. Temporal pattern formation by heterochronic genes. Annu. Rev.Genet. 31, 611–634 (1997).

22. Grimson, A. et al. Early origins and evolution of microRNAs and Piwi-interacting RNAsin animals. Nature 455, 1193–1197 (2008).

23. Seitz, H., Ghildiyal, M. & Zamore, P.D. Argonaute loading improves the 5¢ precision ofboth microRNAs and their miRNA strands in flies. Curr. Biol. 18, 147–151(2008).

24. Han, J. et al. Molecular basis for the recognition of primary microRNAs by the Drosha-DGCR8 complex. Cell 125, 887–901 (2006).

25. Du, T. & Zamore, P.D. microPrimer: the biogenesis and function of microRNA.Development 132, 4645–4652 (2005).

26. Khvorova, A., Reynolds, A. & Jayasena, S.D. Functional siRNAs and miRNAs exhibitstrand bias. Cell 115, 209–216 (2003).

27. Schwarz, D.S. et al. Asymmetry in the assembly of the RNAi enzyme complex. Cell115, 199–208 (2003).

28. Han, J. et al. The Drosha-DGCR8 complex in primary microRNA processing. GenesDev. 18, 3016–3027 (2004).

29. MacRae, I.J. & Doudna, J.A. Ribonuclease revisited: structural insights into ribonu-clease III family enzymes. Curr. Opin. Struct. Biol. 17, 138–145 (2007).

30. Axtell, M.J. Evolution of microRNAs and their targets: are all microRNAs biologicallyrelevant? Biochim. Biophys. Acta 1779, 725–734 (2008).

31. Chen, J.F. et al. The role of microRNA-1 and microRNA-133 in skeletal muscleproliferation and differentiation. Nat. Genet. 38, 228–233 (2006).

32. Davidson, B., Shi, W., Beh, J., Christiaen, L. & Levine, M. FGF signaling delineates thecardiac progenitor field in the simple chordate, Ciona intestinalis. Genes Dev. 20,2728–2738 (2006).

33. Corbo, J.C., Levine, M. & Zeller, R.W. Characterization of a notochord-specificenhancer from the Brachyury promoter region of the ascidian, Ciona intestinalis.Development 124, 589–602 (1997).

34. Biemar, F. et al. Comprehensive identification of Drosophila dorsal-ventral patterninggenes using a whole-genome tiling array. Proc. Natl. Acad. Sci. USA 103,12763–12768 (2006).

35. Bushati, N., Stark, A., Brennecke, J. & Cohen, S.M. Temporal reciprocity of miRNAsand their targets during the maternal-to-zygotic transition in Drosophila. Curr. Biol. 18,501–506 (2008).

36. Beh, J., Shi, W., Levine, M., Davidson, B. & Christiaen, L. FoxF is essential forFGF-induced migration of heart progenitor cells in the ascidian Ciona intestinalis.Development 134, 3297–3305 (2007).

37. Babiarz, J.E., Ruby, J.G., Wang, Y., Bartel, D.P. & Blelloch, R. Mouse ES cellsexpress endogenous shRNAs, siRNAs, and other Microprocessor-independent, Dicer-dependent small RNAs. Genes Dev. 22, 2773–2785 (2008).

38. Wang, Z. & Kiledjian, M. Functional link between the mammalian exosome and mRNAdecapping. Cell 107, 751–762 (2001).

39. Wilusz, C.J., Wormington, M. & Peltz, S.W. The cap-to-tail guide to mRNA turnover.Nat. Rev. Mol. Cell Biol. 2, 237–246 (2001).

40. Zhang, H., Kolb, F.A., Jaskiewicz, L., Westhof, E. & Filipowicz, W. Single processingcenter models for human Dicer and bacterial RNase III. Cell 118, 57–68 (2004).

41. Brennecke, J. et al. Discrete small RNA-generating loci as master regulators oftransposon activity in Drosophila. Cell 128, 1089–1103 (2007).

42. Haley, B., Hendrix, D., Trang, V. & Levine, M. A simplified miRNA-based gene silencingmethod for Drosophila melanogaster. Dev. Biol. 321, 482–490 (2008).

43. Norden-Krichmar, T.M., Holtz, J., Pasquinelli, A.E. & Gaasterland, T. Computationalprediction and experimental validation of Ciona intestinalis microRNA genes. BMCGenomics 8, 445 (2007).

44. Chapman, J. Whole Genome Shotgun Assembly in Theory and Practice. PhD Thesis,Univ. California, Berkeley, 50–51 (2004).

45. Zuker, M. Mfold web server for nucleic acid folding and hybridization prediction.Nucleic Acids Res. 31, 3406–3415 (2003).

46. Mathews, D.H., Sabina, J., Zuker, M. & Turner, D.H. Expanded sequence dependenceof thermodynamic parameters improves prediction of rna secondary structure. J. Mol.Biol. 288, 911–940 (1999).

ART IC L E S

NATURE STRUCTURAL & MOLECULAR BIOLOGY VOLUME 16 NUMBER 2 FEBRUARY 2009 1 8 9

©20

09 N

atu

re A

mer

ica,

Inc.

All

rig

hts

res

erve

d.

Supplemental Material

A distinct class of small RNAs arise from pre-miRNA-proximal regions in a

simple chordate

Weiyang Shi1,2, David Hendrix1,2, Mike Levine1*, and Benjamin Haley1*

1Dept. Mol Cell Biol.

Division of Genetics, Genomics, and Development

Center for Integrative Genomics

University of California, Berkeley, CA 94720-3200

2These authors contributed equally to the work

*Authors for correspondence

[email protected]

[email protected]

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

UUA

UG

UG

GA

AU

UUA

CC

A GU

UG

GC

UG

GC

GGUGU

CC

CUGAGACCCUAAAACGUGAGA

CG

G

UUUCACGUAUUCGGCUCUCAG

U

ACAUU

UA

CAC

GA

CC

GA

C

AA

UU

CG

AC

A GA

A

Unfertilized egg

10

10

0

Early embryo

10

10

0

Late embryo

10

10

0

Adult

10

10

0 AUGUGGAAUUUACCAGUUGGCUGGCGGUGUCCCUGAGACCCUAAAACGUGAGACGGUUUCACGUAUUCGGCUCUCAGUACAUUUACACGACCGACAAUUCGACAGAAUUC.(((.(((((.....((((((((..(((((..((((((.((.((.((((((((...)))))))).)).)).)))))).)))))..)).).)))))))))).)))..))..

6

6

6

6

3

3

3

3

Read

sR

eads

Read

sR

eads

miR-125 miR-125*

C. intestinalis miR-125 locus

a b

miR

-125

(13,

336

read

s) miR

-125* (1 read)

165

143

54

12,974

1

Supplementary Figure S1. Expression of small RNAs from the Ciona miR-125 locus throughout development. a, Each stack represents reads centered upon a particular position. miR-125 is picture in blue, miR-125* is in burgundy. b, Corresponding position of sequenced reads on the predicted secondary structuresurrounding pre-miR-125, color-coded as in a.

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 10 100 1000 10000 100000 1000000

miR

miR*

moR

moR*

loop

0

0.1

0.2

0.3

0.4

0.5

0.6

19 20 21 22 23

miRmiR*moRmoR*loop

Frac

tion

of r

ead

s

Length (nt)

5´ h

eter

ogen

eity

at

locu

s

Number of reads at locus

a

b

5´-moR-133

5´-moR-miRDeep_14

moR-219

5´-moR-124-1

miR-miRDeep_2

miR-126

miR-let-7a-2

miR-let-7a-1

Supplementary Figure S2. Characteristics of reads overlapping miRNA loci. a, Read length distributionss across miRNA-defined loci. This figure shows the distribution of the lengths of reads that align to a given type of miRNA locus. b, Scatter plot of 5´ heterogeneity vs number of reads.

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-50

-44

-38

-32

-26

-20

-14 -8 -2 4 10 16 22 28 34 40 46 52 58 64 70 76 82 88 94 100

Pro

bab

ility

of

mis

mat

ch

Nucleotide position

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-50

-44

-38

-32

-26

-20

-14 -8 -2 4 10 16 22 28 34 40 46 52 58 64 70 76 82 88 94 100

Pro

bab

ility

of

mis

mat

ch

Nucleotide position

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-50

-44

-38

-32

-26

-20

-14 -8 -2 4 10 16 22 28 34 40 46 52 58 64 70 76 82 88 94 100

Pro

bab

ility

of

mis

mat

ch

Nucleotide position

C. intestinalis moR-positive

C. intestinalis moR-negative

D. melanogaster miRNA loci

Supplementary Figure S3. Structure-specific qualitites of Ciona and Drosophila miRNA loci. Probability of a mismatchedbase as a function of position. This figure demonstrates the rate at which bases are unpaired in the pri/pre-miRNA hairpinstructure as a function of nucleotide position relative to the 5´ position of the 5´ miRNA/miRNA* species.

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

Ci moR−Positive Ci moR−Negative D.mel moR−Positive D.mel moR−Negative

10

20

30

40

50Lo

op L

engt

h (n

t)

1

10

100

1000

10000

100000

1000000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

miRmiR*moRmoR*

Pro

duc

t ab

und

ance

(tot

al r

ead

s)

Fraction of conserved nucleotides between C. intestinalis and C. savignyi

a b

c

Supplementary Figure S4. General features of miRNA and moR loci. a, Global analysis of pre-miRNA loop length between loci with or without associated moRs in Ciona and Drosophila. This figure depicts the loop lengths, as defined by the total number of bases between the 5´ and 3´ major products (measured from the last base of the most abundant 5´ and first base of the most abundant 3´ products). The middle bar shows the median value, and boxes enclose about 25% of thedata points above and below the median value. Data sets consist of 38 (Ciona moR-positive), 25 (Ciona moR-negative), 36 (Drosophila moR-positive), and 132 (Drosophila moR-negative) miRNA loci. No significant trends exist for moR-positive miRNA, but the tend to have shorter loop lengths, on average. b, Sequence-specific characteristics of small RNAs derived from miRNA loci. Presented are sequence LOGOs created with weblogo 3 (http://code.google.com/p/weblogo/) for the different types of products observed. These LOGOs demonstrate the aggregate nucleotide composition (and information content) as a function of position when all examples of miRNA, miRNA*, moR, and moR* are aligned by the optimal 5´ splice site. As noted elsewhere, miRNA sequences have an AU-bias at the 5´ nucleotide (REF. 11). moR sequences and miRNA* sequences do not show any clear nucleotide composition biases. However, moR* sequences show a strong AU-bias at position 19 (14 of 15 examples have an A or U), but it is unclear if this is significant give the limited number of examples. c, Conservation at each nucleotide across the length of a given small RNA product between Ciona intestinalis and Ciona savignyi. Conservation is defined here as the fraction of nucleotides unchanged from C. intestinalis to C. savignyi across the mature miRNA, miRNA*, moR, or moR* sequences. Alignments for each locus were extracted from the full genome align-ment between the two species.

miRNA (74 loci)

miRNA* (63 loci)

moR (42 loci)

moR* (15 loci)

5´-moR-133

5´-moR-219

5´-moR-124-1

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

Cin-miR-92ccanonical class-III RNAse-III product

Cin-miR-124-2

5´—UAAGGCACGCGGUGAAUGCC—3´

3´—UCGUUCCAGGUGUUAUUUAUGA—5´

Cin-miR-124-1

5´—UAAGGCACGCGGUGAAUGCC—3´

3´—UCGUUCCAGGUGUCAUUUGUGC—5´

5´—UCAAAAGUCGCCGCUCGCU—3´

3´—CUGAGUAUUCGCGGGAAAUU—5´

Cin-miR-200b

5´—UAAUACUGCCUGGUAAUGGUG—3´

3´—GGCUAUGAUGCACCAACCACU—3´

Cin-moR-200b

5´—GGGUGGAUACGUACGAGCAGC—3´

3´—AAGCCAACCGUGCUUUGAAA—5´

Cin-miR-153

5´—UUGCAUAGUAACAAAAGUGAUC—3´

3´—UCAACGUAUUAUCUUUUUACUG—5´

Cin-moR-153

5´—ACGUUCGCAUGAGCUGUCAC—3´

3´—UGUGCAUGUGUCUUAAGAGU—5´

5´—CGAAUUUUUGCUGCUUUCAUCU—3´

3´—UGGCUUGUAUCGAUACUGCA—5´

5´—UAUUGCACUCGUCCCGGUCU—3´

3´—GCGUUACGUGAAGCGGUCCGGA—5´

5´—NNNNNNNNNNNNNNNNNNNNN—3´

3´—NNNNNNNNNNNNNNNNNNNNN—5´

~19-bp core

~2-nt 3´ overhang

5´—CGAUAUAUGAUGUCGAUGUGGC—3´

3´—GGUUAUGUACCACAGUGACAA—5´

5´—UCUGUUUUGAAAUUUUCAUC—3´

3´—CGGGACCUAACUUUUGCAGU—5´

5´—CACUGGGUGCCGCUCUCUAC—3´

3´—CGAUGCUCCACGGGCCCACC—5´

5´—CAACCAGAU-CAGAAAGUCG—3´

3´—UGUUGGUAUACGUCUUUCUGCU—5´

Cin-miR-miRDeep_51

Cin-miR-miRDeep_39

Cin-moR-miRDeep_51

Cin-moR-miRDeep_39

Cin-moR-124-1

Cin-moR-124-2

Supplementary Figure S5. Alignment of miRNA/miRNA* and 5´ moR/3´ moR duplexes in the context of the predicted structure surroundingeach pri/pre-miRNA. Ciona miR-92c is shown to represent the structural properties of a pri/pre-miRNA locus that does not produce 5´ or 3´moRs. The remaining examples represent miRNA loci that produce multiple copies of coincident 5´ and 3´ moRs. In each case, the miRNAis blue, miRNA star is burgundy, 5´ moR is yellow, and 3´ moR is green.

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

a

b

Ave

rage

err

or r

ate

Nucleotide position

Frac

tion

of r

ead

s in

lib

rary

Read length (nt)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

EggEarly EmbryoLate EmbryoAdult

0

0.1

0.2

0.3

0.4

0.5

0.6

1 3 5 7 9 11 13 15 17 19 21 23

EggEarly EmbryoLate EmbryoAdult

Supplementary Figure S6. General summary of Illumina sequencing data from Ciona small RNA libraries.a, Average error reate as a function of nucleotide position in each library. The average estimated probabilityof an incorrect base-call is plotted as a function of nucelotide position. b, Distribution of read length in eachlibrary after trimming. Reads were trimmed with a custom algorithm such that the total error for the trimmedread is less than 10%, and the total read quality is globally optimized.Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

1e-06

1e-05

0.0001

0.001

0.01

0.1

1

-65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40

Fra

ction o

f re

ads w

ith 5

’ positio

n

Distance from loop (nt)

HomologymiRDeep

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

-65 -60 -55 -50 -45 -40 -35 -30 -25 -20 -15 -10 -5 0 5 10 15 20 25 30 35 40

Fra

ction o

f pro

ducts

with 5

’ positio

n

Distance from loop (nt)

HomologymiRDeep

a

b

Supplementary Figure S7. Distribution of small RNA read/product positions. a, Distribution of the 5´ end of reads that aremapped to miRNA loci. The positions on each sideof the hairpin have been mapped relative to the last base pair of the loop, so as to correct for different loop lengthsacross examples. Positions with negative values are 5´ of the loop, and positions with positive values are 3´ of the loop. b, Distribution of the 5´ ends of products that are found overlapping miRNA loci. Each product is a cluster of overlapping reads, and the most abundant position for the product is used. Positions are mapped as in a. Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

Tables Table S1: Summary of Illumina read data

Lane 1 2 3 4

Tissue Egg Early Embryo Late Embryo Adult

Total Reads 2589312 2989422 3053010 2511445

Total Trimmed > 17bp 2269712 2717070 2697931 2122586

Aligned Reads (E < 1) 1398432 1637592 1508792 1139758

Aligned Reads (E < 0.01)

878616 1028941 800228 652371

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

Table S2: Genome location and expression information of known Ciona intestinalis miRNAs a, Previously predicted miRNAs based on homology with other miRNAs mapped to Ciona Genome (JGI Version 1.0) Name Chrom Start Stop Strand cin-mir-133 Scaffold_844 21285 21445 + cin-mir-1 Scaffold_844 21045 21205 + cin-mir-183 Scaffold_291 71695 71854 - cin-mir-33a Scaffold_1128 10928 11087 - cin-mir-33b Scaffold_164 201448 201608 - cin-mir-141 Scaffold_1272 8659 8815 + cin-mir-200 Scaffold_1272 8295 8451 + cin-mir-101 Scaffold_426 19046 19206 + cin-let-7d Scaffold_95 49157 49314 - cin-let-7e Scaffold_95 49157 49314 - cin-mir-1473 Scaffold_95 49611 49771 - cin-mir-125 Scaffold_95 49017 49173 - cin-mir-219 Scaffold_71 79957 80118 - cin-mir-31 Scaffold_260 39363 39521 - cin-mir-520d Scaffold_26 281108 281264 + cin-mir-34 Scaffold_38 156994 157152 - cin-mir-452 Scaffold_336 64753 64911 - cin-let-7a-1 Scaffold_138 96151 96311 + cin-let-7a-2 Scaffold_138 97064 97226 + cin-let-7b Scaffold_138 96669 96830 + cin-let-7c Scaffold_138 96868 97024 + cin-mir-155 Scaffold_138 190603 190760 - cin-mir-92b Scaffold_168 27006 27171 + cin-mir-78 Scaffold_175 51513 51671 + cin-mir-184 Scaffold_135 138251 138413 - cin-mir-92a Scaffold_20 505443 505602 + cin-mir-92c Scaffold_20 506232 506395 + cin-mir-92-4 Scaffold_20 505779 505943 + cin-mir-135-2 Scaffold_292 33692 33854 + cin-mir-135-1 Scaffold_292 34135 34294 + cin-mir-153 Scaffold_27 231958 232118 + cin-mir-1497 Scaffold_309 72905 73068 + cin-mir-281 Scaffold_211 21117 21274 + cin-mir-124-1 Scaffold_310 18814 18973 - cin-mir-124-2 Scaffold_310 18674 18833 - cin-mir-126 Scaffold_196 41456 41616 - cin-mir-302a Scaffold_118 34242 34409 - cin-mir-181 Scaffold_128 189340 189498 - cin-mir-7 Scaffold_116 233725 233882 +

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

b, Expression level of primary miRNA products for homology-based miRNAs.

# miR ID Location Total egg Early Embryo

Late Embryo adult

cin-mir-133 Scaffold_844:21375..21397:+ 2167 2 55 132 1978

cin-mir-133* Scaffold_844:21311..21336:+ 1002 5 104 698 195

cin-mir-1 Scaffold_844:21149..21170:+ 18153 27 959 2029 15138

cin-mir-1* Scaffold_844:21056..21075:+ 49 0 23 25 1

cin-mir-183 Scaffold_291:71777..71800:- 396 13 10 125 248

cin-mir-183* Scaffold_291:71750..71769:- 9 0 0 8 1

cin-mir-33a Scaffold_1128:11013..11035:- 86 32 27 3 24

cin-mir-33a* Scaffold_1128:11036..11054:- 1 0 1 0 0

cin-mir-33b Scaffold_164:201557..201578:- 1 1 0 0 0

cin-let-7d Scaffold_95:49242..49266:- 40482 559 476 417 39030

cin-let-7d* Scaffold_95:49206..49225:- 11 0 0 1 10

cin-let-7e Scaffold_95:49242..49266:- 40482 559 476 417 39030

cin-let-7e* Scaffold_95:49206..49225:- 11 0 0 1 10

cin-mir-1473 Scaffold_95:49662..49684:- 2250 39 35 73 2103

cin-mir-1473* Scaffold_95:49700..49721:- 31 0 1 1 29

cin-mir-125 Scaffold_95:49098..49120:- 13335 165 143 54 12973

cin-mir-125* Scaffold_95:49070..49089:- 1 0 0 0 1

cin-mir-101 Scaffold_426:19163..19181:+ 0.0078 0 0 0 0.01

cin-mir-141 Scaffold_1272:8740..8764:+ 9890 2936 2577 807 3570

cin-mir-141* Scaffold_1272:8710..8730:+ 215 17 82 78 38

cin-mir-200 Scaffold_1272:8378..8403:+ 4322 1593 1395 485 849

cin-mir-200* Scaffold_1272:8346..8366:+ 35 2 13 10 10

cin-mir-219 Scaffold_71:80043..80065:- 714 9 81 619 5

cin-mir-219* Scaffold_71:80065..80088:- 232 0 39 193 0

cin-mir-31 Scaffold_260:39450..39474:- 8329 24 43 148 8114

cin-mir-520d Scaffold_26:281108..281264:+ No Reads

cin-mir-34 Scaffold_38:157098..157123:- 5625 12 81 437 5095

cin-mir-34* Scaffold_38:157031..157052:- 2 0 1 0 1

cin-mir-452 Scaffold_336:64753..64911:- No Reads

cin-let-7a-1 Scaffold_138:96192..96216:+ 14744 1336.5 1157.5 779 11471

cin-let-7a-1* Scaffold_138:96275..96295:+ 23 15 3 1 4

cin-let-7a-2 Scaffold_138:97111..97134:+ 15212 1335.5 1166.5 784 11926

cin-let-7a-2* Scaffold_138:97161..97185:+ 15 1 1 1 12

cin-let-7b Scaffold_138:96719..96742:+ 58100 5813 4739 2833 44715

cin-let-7b* Scaffold_138:96700..96719:+ 18 2 1 3 12

cin-let-7c Scaffold_138:96912..96935:+ 20004 1323 1394 581 16706

cin-let-7c* Scaffold_138:96892..96911:+ 2 0 0 0 2

cin-mir-155 Scaffold_138:190603..190760:- No Reads

cin-mir-92b Scaffold_168:27097..27118:+ 2119 460 556 354 749

cin-mir-92b* Scaffold_168:27039..27060:+ 40 7 28 5 0

cin-mir-78 Scaffold_175:51513..51671:+ No

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

Reads

cin-mir-184 Scaffold_135:138293..138314:- 139 1 0 7 131

cin-mir-92a Scaffold_20:505527..505548:+ 555 0 83 210 262

cin-mir-92a* Scaffold_20:505498..505516:+ 10 0 4 5 1

cin-mir-92c Scaffold_20:506320..506342:+ 2360 2 375 758 1225

cin-mir-92c* Scaffold_20:506287..506308:+ 36 0 19 12 5

cin-mir-92-4 Scaffold_20:505869..505889:+ 1471 0 173 571 727

cin-mir-92-4* Scaffold_20:505833..505855:+ 317 0 161 135 21

cin-mir-153 Scaffold_27:232047..232070:+ 9683 3157 2801 537 3188

cin-mir-153* Scaffold_27:231989..232009:+ 30 2 3 23 2

cin-mir-135-2 Scaffold_292:33743..33766:+ 117 2 14 3 98

cin-mir-135-1 Scaffold_292:34187..34208:+ 104 3 6 1 94

cin-mir-135-1* Scaffold_292:34221..34243:+ 1 0 0 1 0

cin-mir-126 Scaffold_196:41504..41527:- 3146 0 5 22 3119

cin-mir-126* Scaffold_196:41544..41565:- 4 0 0 1 3

cin-mir-124-1 Scaffold_310:18863..18886:- 8207 32.5 1760.5 6381.5 32.5

cin-mir-124-1* Scaffold_310:18899..18921:- 373 0 124 249 0

cin-mir-124-2 Scaffold_310:18723..18746:- 8200 32.5 1759.5 6375.5 32.5

cin-mir-124-2* Scaffold_310:18759..18781:- 500 0 177 323 0

cin-mir-281 Scaffold_211:21203..21225:+ 812 9 8 3 792

cin-mir-281* Scaffold_211:21171..21189:+ 3 0 0 0 3

cin-mir-1497 Scaffold_309:72998..73022:+ 489747 125429 120217 195474 48627

cin-mir-1497* Scaffold_309:72936..72956:+ 34 1 15 18 0

cin-mir-181 Scaffold_128:189343..189364:- 0.0107 0.01 0 0 0

cin-mir-302a Scaffold_118:34242..34409:- No Reads

cin-mir-7 Scaffold_116:233779..233801:+ 229 22 23 70 114

cin-mir-7* Scaffold_116:233810..233832:+ 5 0 2 3 0

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

Table S3: Summary of miRDeep predictions of novel miRNAs in Ciona intestinalis a, General summary of miRDeep input and output sequence data

Total Reads 9807299

Total Unique Reads 1529918

Total Unique Reads Aligned to Genome (E < 1) 938889

Total Aligned with 100% identity and 5 or fewer hits to the genome

337443

Candidate precursor loci extracted 106584

Predicted miRs 65

High quality, Predicted miRs not in the homology set 41

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

b, All distinct miRDeep predictions mapped to Ciona intestinalis genome (JGI Version 1.0) Name Chrom Start Stop Strand miRDeep-2 Scaffold_105 163753 163916 - miRDeep-3 Scaffold_1079 7347 7505 + miRDeep-5 Scaffold_111 146730 146888 + miRDeep-7 Scaffold_1176 10070 10237 - miRDeep-8 Scaffold_1337 4621 4782 + miRDeep-14 Scaffold_168 27307 27465 + miRDeep-15 Scaffold_17 433063 433225 + miRDeep-16 Scaffold_189 160759 160918 + miRDeep-17 Scaffold_19 92535 92692 - miRDeep-21 Scaffold_215 87008 87168 + miRDeep-22 Scaffold_22 153526 153687 - miRDeep-23 Scaffold_222 34762 34920 - miRDeep-25 Scaffold_255 85761 85925 + miRDeep-27 Scaffold_291 70707 70867 - miRDeep-28 Scaffold_291 70231 70387 - miRDeep-29 Scaffold_3 99854 100015 + miRDeep-30 Scaffold_31 66645 66807 - miRDeep-31 Scaffold_312 69448 69606 - miRDeep-32 Scaffold_33 145229 145391 + miRDeep-33 Scaffold_342 54590 54746 - miRDeep-38 Scaffold_366 11427 11587 + miRDeep-39 Scaffold_380 26562 26720 - miRDeep-40 Scaffold_402 49413 49571 - miRDeep-41 Scaffold_416 59608 59766 - miRDeep-42 Scaffold_430 7590 7751 + miRDeep-45 Scaffold_50 122578 122734 + miRDeep-46 Scaffold_50 123455 123616 + miRDeep-47 Scaffold_52 196448 196611 - miRDeep-48 Scaffold_52 196091 196252 - miRDeep-50 Scaffold_62 217152 217314 + miRDeep-51 Scaffold_63 49104 49265 - miRDeep-52 Scaffold_656 16990 17149 + miRDeep-53 Scaffold_71 311775 311935 + miRDeep-54 Scaffold_71 311914 312072 + miRDeep-55 Scaffold_71 314561 314720 + miRDeep-56 Scaffold_71 315224 315382 + miRDeep-58 Scaffold_8 593432 593597 - miRDeep-59 Scaffold_81 252569 252729 - miRDeep-60 Scaffold_88 413 573 - miRDeep-61 Scaffold_90 48777 48935 -

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

miRDeep-65 Scaffold_952 1081 1238 +

c, Expression of primary products of miRDeep predictions.

# miR ID Location Total egg Early Embryo

Late Embryo adult

miRDeep-65 Scaffold_952:1170..1193:+ 8 0 4 3 1

miRDeep-65* Scaffold_952:1125..1148:+ 2.5 1 1.5 0 0

miRDeep-28 Scaffold_291:70320..70342:- 223 6 4 113 100

miRDeep-28* Scaffold_291:70281..70299:- 1 0 0 1 0

miRDeep-27 Scaffold_291:70791..70813:- 93 4 6 52 31

miRDeep-27* Scaffold_291:70759..70780:- 8 0 2 6 0

miRDeep-41 Scaffold_416:59690..59711:- 24 0 0 0 24

miRDeep-41* Scaffold_416:59661..59682:- 2 0 0 1 1

miRDeep-48 Scaffold_52:196176..196197:- 1634 314 358 186 776

miRDeep-48* Scaffold_52:196198..196216:- 2 1 1 0 0

miRDeep-47 Scaffold_52:196535..196555:- 795 45 61 99 590

miRDeep-47* Scaffold_52:196499..196522:- 39 5 7 13 14

miRDeep-40 Scaffold_402:49499..49522:- 61 30 23 6.5 1.5

miRDeep-40* Scaffold_402:49460..49481:- 0.0667 0.07 0 0 0

miRDeep-17 Scaffold_19:92584..92606:- 9 3 3 3 0

miRDeep-17* Scaffold_19:92619..92640:- 5 3 2 0 0

miRDeep-56 Scaffold_71:315276..315302:+ 24322.25 9505 11066.92 3734.33 16

miRDeep-56* Scaffold_71:315259..315278:+ 16 0 15 1 0

miRDeep-54 Scaffold_71:311967..311990:+ 31852.75 13171 14652.33 4015.42 14

miRDeep-54* Scaffold_71:311999..312019:+ 33 1 32 0 0

miRDeep-53 Scaffold_71:311827..311854:+ 30251.5 11874 13416 4945.5 16

miRDeep-53* Scaffold_71:311861..311884:+ 440 56 370 14 0

miRDeep-55 Scaffold_71:314615..314637:+ 19423 6874 9002 3535 12

miRDeep-55* Scaffold_71:314646..314666:+ 41 1 40 0 0

miRDeep-60 Scaffold_88:458..486:- 74.18 37.75 21.73 14.7 0

miRDeep-3 Scaffold_1079:7433..7455:+ 35 20 9 6 0

miRDeep-3* Scaffold_1079:7401..7420:+ 10 7 3 0 0

miRDeep-16 Scaffold_189:160808..160831:+ 31 22 5 4 0

miRDeep-16* Scaffold_189:160851..160869:+ 1 1 0 0 0

miRDeep-21 Scaffold_215:87094..87115:+ 292 74 80 55 83

miRDeep-21* Scaffold_215:87062..87083:+ 17 0 5 6 6

miRDeep-15 Scaffold_17:433116..433137:+ 299 1 0 0 298

miRDeep-15* Scaffold_17:433152..433173:+ 20 0 0 2 18

miRDeep-59 Scaffold_81:252653..252674:- 9074 149 86 91 8748

miRDeep-59* Scaffold_81:252622..252643:- 1 0 0 0 1

miRDeep-33 Scaffold_342:54671..54693:- 64 42 19 2 1

miRDeep-33* Scaffold_342:54641..54662:- 1 0 1 0 0

miRDeep-8 Scaffold_1337:4673..4697:+ 26968 3832 3607 5916 13613

miRDeep-8* Scaffold_1337:4708..4731:+ 84 0 22.5 59.5 2

miRDeep-22 Scaffold_22:153610..153633:- 50486.5 16885 13704 15510 4387.5

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

miRDeep-22* Scaffold_22:153578..153600:- 54 0 26 23 5

miRDeep-14 Scaffold_168:27393..27417:+ 4611 1102 1268 925 1316

miRDeep-14* Scaffold_168:27361..27380:+ 424 0 200 208 16

miRDeep-52 Scaffold_656:17031..17052:+ 13 0 11 2 0

miRDeep-52* Scaffold_656:17083..17101:+ 0.0667 0 0 0.07 0

miRDeep-23 Scaffold_222:34844..34866:- 432 162 158 44 68

miRDeep-23* Scaffold_222:34865..34885:- 12 1 3 8 0

miRDeep-32 Scaffold_33:145284..145305:+ 37 13 14 2 8

miRDeep-32* Scaffold_33:145265..145283:+ 0.25 0 0.25 0 0

miRDeep-5 Scaffold_111:146818..146840:+ 3751 0 12 1 3738

miRDeep-5* Scaffold_111:146779..146797:+ 11 0 0 0 11

miRDeep-42 Scaffold_430:7676..7699:+ 14031 13.5 1049.5 8624.5 4343.5

miRDeep-42* Scaffold_430:7645..7666:+ 216 0 85 129 2

miRDeep-58 Scaffold_8:593521..593543:- 34596 20 97 152 34327

miRDeep-58* Scaffold_8:593486..593506:- 7 0 0 0 7

miRDeep-51 Scaffold_63:49188..49212:- 17120 1602 1339.5 770 13408.5

miRDeep-51* Scaffold_63:49157..49178:- 50 8 11 1 30

miRDeep-29 Scaffold_3:99904..99929:+ 89 22 16 20 31

miRDeep-29* Scaffold_3:99942..99962:+ 14 1 8 3 2

miRDeep-46 Scaffold_50:123507..123531:+ 4032 15 12 16 3989

miRDeep-46* Scaffold_50:123534..123561:+ 21 0 0 0 21

miRDeep-45 Scaffold_50:122631..122653:+ 1251 2 7 4 1238

miRDeep-45* Scaffold_50:122661..122679:+ 3 0 0 0 3

miRDeep-7 Scaffold_1176:10160..10183:- 18562 4219 3781 8549 2013

miRDeep-7* Scaffold_1176:10123..10144:- 18 2 11 5 0

miRDeep-38 Scaffold_366:11524..11547:+ 9 1 5.17 2.83 0

miRDeep-38* Scaffold_366:11470..11490:+ 1 0 0.67 0.33 0

miRDeep-31 Scaffold_312:69500..69521:- 223 50 100 48 25

miRDeep-31* Scaffold_312:69534..69555:- 3 0 3 0 0

miRDeep-30 Scaffold_31:66731..66753:- 88978.5 29893 27121.5 15230 16734

miRDeep-30* Scaffold_31:66753..66772:- 106 16 31 57 2

miRDeep-61 Scaffold_90:48827..48850:- 3245 84 60 45 3056

miRDeep-61* Scaffold_90:48860..48882:- 6 0 0 4 2

miRDeep-50 Scaffold_62:217201..217222:+ 12 8 3 0 1

miRDeep-50* Scaffold_62:217243..217263:+ 2 1 0 0 1

miRDeep-25 Scaffold_255:85851..85871:+ 154 3 14 87 50

miRDeep-25* Scaffold_255:85816..85837:+ 79 1 18 44 16

miRDeep-2 Scaffold_105:163838..163864:- 8388 503 445 249 7191

miRDeep-2* Scaffold_105:163808..163827:- 14 1 2 3 8

miRDeep-39 Scaffold_380:26643..26665:- 835 357 306 51 121

miRDeep-39* Scaffold_380:26614..26636:- 310 8 76 196 30

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

Table S4: Summary of miRNA and moR products. a, This table shows the number of miRNAs that have a given product type from a particular dataset.

Library Total miRs

5' minor

5' major loop

3' major

3' minor

Ciona_Homology 39 21 31 7 28 9 Ciona_miRDeep 41 19 40 4 38 8 Total 80 40 71 11 66 17

b, Read count for miRNA and moR products. As with Tables S2b and S3c, the counts have been uniformly scaled to account for reads that hit multiple locations in the genome.

Type Egg Early Embryo

Late Embryo Adult

moR 101.5 406.25 912.5 129 miRNA 203736.91 206136.67 261823.83 334922.51 Loop 0 16.5 11 12 miRNA* 68.07 1342.17 1577.41 360 moR* 3 12 17 10

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

DATA ANALYSIS AND METHODOLOGY

Supplementary Fig. S2 demonstrates resulting length distribution after trimming

was performed. The trimmed reads averaged 21-nts in length. Reads greater

than 17nt in length were kept and BLASTed to the Ciona genome using gapless

BLAST (-g F) with an E-value cut-off of 1 (-e 1.0). The resulting BLAST hits were

filtered to have an E-value of less than 0.01. These results are summarized in

Supplementary Table S1.

MicroRNA Analysis: Homology based predictions

Previously observed miRNA loci were compiled from the literature

(Supplementary Table S2a)1-3. We observed small RNA expression from 34/39

of the known miRNA loci. The total number of reads that map to these loci are

listed in Supplementary Table S2b. Here, the miRNA sequence is defined as

the ~21-nt site with the most mapped reads, while the miRNA* was the less

abundant sequence in the context of the predicted pre-miRNA. The counts in

table S3 have been uniformly adjusted so that a read that hits N loci in the

genome will contribute a count of 1/N to each locus.

MicroRNA Analysis: Structure based predictions

In order to extract novel miRNA loci from our data set, we used a recently

described program known as miRDeep4. miRDeep is a custom PERL script that

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

combines BLAST and RNAfold to predict miRNAs from high-throughput

sequencing data. Briefly, the protocol was to combine sequenced reads from all

libraries and BLAST to the genome, with an E-value cut-off of 1 or less, 100%

identity, and five or fewer genomic hits. After analyzing each locus for the

potential to fold into a pre-miRNA-like hairpin, 65 regions were considered as

possible miRNAs. This analysis is summarized in Supplementary Table S3.

From this list, 14 predictions were excluded due to low quality (typically owing to

high heterogeneity at the miRNA site or spurious extra reads) and 10 were

removed due to overlap with miRNAs from the homology-based set, leaving 41

predicted miRNA loci. These final predictions are shown in Supplementary

Table S3b. Supplementary Table S3c displays read counts at each novel

miRNA locus (miRNA and miRNA*) from all sampled developmental stages, and

as with Supplementary Table S2b, these counts have been uniformly adjusted.

Distribution of Reads across miRNA loci

After the miRNA datasets were collected, all reads that map to miRNA loci were

examined.

Supplementary Fig. S7a shows the distribution of the 5' position of all reads

when mapped to the genome. The coordinates have been mapped relative to

the last basepair of the hairpin before the loop, so as to control for different loop

lengths. The expectation from such a distribution would be to see only two

distinct regions of enrichment (miRNA and miRNA*). This representation shows

at least 3 distinct regions of enrichment. This largest two regions are the

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

positions of the standard miRNA and miRNA* products and are localized such

that the 5' ends of these reads are peaked from -25 to -21 and the second is

peaked at around +2 to +6. In addition, read enrichment from positions -50 to -40

suggest an unexpected product. Moreover, this additional region of enrichment is

strongly peaked, suggesting tight constraints control its position. If these products

were merely sequence fragments, one would expect a more uniform distribution

of positions.

Supplementary Fig. S7b shows the distribution of the 5' position of all distinct

products, where "products" are defined as overlapping clusters of reads that map

to the same locus. This shows the distribution of the most abundant positions at

each locus. As with Supplementary Fig. S7a, the expectation is to see only two

regions of enrichment, but this figure demonstrates an additional region of

enrichment.

Distribution and stability of miRNA and moR products

Supplementary Table S4a summarizes the type of products (miRNA, moR, and

loop) observed at all miRNA loci with combined datasets (Ciona_Homology and

Ciona_miRDeep). Half of the miRNAs observed in our combined dataset had

associated moRs (40/80).

Supplementary Table S4b demonstrates the absolute abundance of the various

products (miRNA, moR, and loop) in all of the libraries sequenced, with both

miRNA datasets combined, as above.

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

moR products were observed in all analyzed libraries. The observation of moR

products in the unfertilized egg is of particular note, as this is a maternally

loaded, uniform cell-type with little or no active transcription. Therefore, it would

be expected that only stabilized, or maternally-deposited small RNAs would be

cloned from this particular library. Coincidentally, the intervening loop sequence

between 11/80 miRNA/miRNA* pairs were cloned in this study. However, none

of the loop regions were cloned from the unfertilized egg, suggesting this species

is unstable and likely a miRNA pathway by-product in Ciona.

An ~20 nt RNA species proximal to the 5’ end of pre-miR-133 was observed in all

surveyed stages of development. Further, the 5’-moR-133 species was present

at levels at or exceeding miR-133* in all examined stages, peaking at a ratio of

39:1 in the adult animal. Curiously, this RNA is also cloned at levels higher than

miR-133 in all embryonic stages. The 5’-moR-133 exhibits near uniformity, as

99% of clones (988/1002) center on the sequence 5’—

AAACAUUGCGGGUGUCGGUC—3’, with shifting bases marked as underlined.

moR stability in the unfertilized egg is most notable for 5’-moR-miRDeep_14.

The miRDeep prediction #14 (miRDeep_14) locus produced 41 copies of a 5´

moR in the egg, all of which maintain a precise 5’ terminus, reminiscent of

canonical miRNAs5. For comparison, ~1100 copies of miRDeep_14 were

observed at this stage, while no miRDeep_14* was present. Moreover, the 5’-

moR-miRDeep_14 frequency was higher than many of the deeply conserved and

coexpressed miRNAs, like miRs-31, 34, 92-3, 281, 183, etc., found in this library.

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

However, there is continuing debate in the field over what level of expression for

a given small RNA constitutes “biological significance”6.

moRs and miRs compared

Supplementary Fig. S2a shows the distribution of read length overlapping a

given type of locus. On the whole, moR products tend to be shorter in length than

miRNA products, with moR products centering around 19-20-nts and miRNA

products centering on 20-21-nts. The exception to this tendency is Cin-miR-1497,

which we found to be the most abundant miRNA in Ciona intestinalis. Cin-miR-

1497 is predominantly cloned as a 19-nt species, and the extremely high copy

number of this miRNA dominates the miRNA size distribution in Supplementary

Fig. S2a. Omitting Cin-miR-1497 reads from the size distribution analysis shows

that most Ciona miRNA loci produce ~21-22-nt species.

It is becoming evident that the stability of a miRNA 5´ terminus is required for its

downstream function. As such, miRNAs tend to have homogenous 5´ termini,

while the 3´ termini can widely vary5. In order to predict those moR sequences

with potential function, we plotted 5´ heterogeneity of all small RNA loci and

products as a function of read abundance (Supplementary Fig. S2b).

As displayed, some miRNA* and moR products are present in the range of

values seen for the miRNA products, indicating that these products could have

similar functional constraints14. So as to not exclude the possibility that moRs

function as miRNAs (owing to the fact that they are similar in size, end structures,

and are sometimes found at similar levels of expression as miRNA and miRNA*),

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

we have included them in the target prediction scheme and output described

below.

MicroRNA Analysis: miRNA and moR target predictions for Ciona

2956 3' UTR sequences where collected from JGI version 1.0 (http://genome.jgi-

psf.org/ciona4/ciona4.download.ftp.html). We then ran the target site prediction

software, TargetScan Release 4.1 (http://www.targetscan.org/) on the UTR

sequences using positions 2-8 of each product type as the seed sequence7-9.

Target prediction results were collected for all products (most predominant

sequence at a given position) and are available at http://flybuzz.berkeley.edu/cgi-

bin/CionaMicroRNAs.cgi. We observed numerous examples where miRNAs and

moRs, or miRNAs and miRNA*s may target the same UTRs.

On moR biogenesis

While we favor a hypothesis that invokes Drosha activity producing the moRs,

moR production could be accomplished by a “double-dicing” mechanism. Here,

a long pre-miRNA could potentially be processed by successive dicing events on

the extended pre-miRNA after its release from the nucleus. This mechanism is

unlikely due to the inappropriate termini observed upon alignment of coincident 5’

and 3’ moRs sequences adjacent to Ciona pre-miRNAs, in context with the

predicted structure of an extended pre-miRNA. As seen with miRs-124-1/2, the

terminus of such an extended pre-miRNA would be within a highly unstructured

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

(unpaired) region, and would be unlikely to form the stable ~2 nt 3’ overhang

required for the PAZ domain of Dicer to bind its substrate10.

On moR Conservation

The full genome alignment of Ciona intestinalis (JGI Version 1.0) to Ciona

savignyi was downloaded (http://pipeline.lbl.gov/downloads.shtml), and regions

of the alignment that overlap the different miRNA loci under consideration were

extracted. Supplementary Fig. S4c demonstrates the conservation of the

predominant products collected [miRNA, miRNA*, moR, and moR*(defined as

the less abundant sequence when both a 5’ and 3’ moR are present)] shown as

the fraction of conserved nucleotides with C. savignyi. This demonstrates that

some of the most abundant moR products are also very highly conserved (> 70%

of nucleotides). Indeed, the most abundant moRs, moR-133, moR-219, and

moR-124-1, all have at least 6 of 7 seed positions conserved (perfectly

conserved in the case of moR-124-1).

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

Referenced Literature

1. Fu, X., Adamski, M. & Thompson, E. M. Altered miRNA repertoire in the

simplified chordate, Oikopleura dioica. Mol Biol Evol 25, 1067-80 (2008).

2. Norden-Krichmar, T. M., Holtz, J., Pasquinelli, A. E. & Gaasterland, T.

Computational prediction and experimental validation of Ciona intestinalis

microRNA genes. BMC Genomics 8, 445 (2007).

3. Prochnik, S. E., Rokhsar, D. S. & Aboobaker, A. A. Evidence for a

microRNA expansion in the bilaterian ancestor. Dev Genes Evol 217, 73-7

(2007).

4. Friedlander, M. R. et al. Discovering microRNAs from deep sequencing

data using miRDeep. Nat Biotechnol 26, 407-15 (2008).

5. Seitz, H., Ghildiyal, M. & Zamore, P. D. Argonaute loading improves the 5'

precision of both MicroRNAs and their miRNA strands in flies. Curr Biol

18, 147-51 (2008).

6. Axtell, M. J. Evolution of microRNAs and their targets: Are all microRNAs

biologically relevant? Biochim Biophys Acta (2008).

7. Grimson, A. et al. MicroRNA targeting specificity in mammals:

determinants beyond seed pairing. Mol Cell 27, 91-105 (2007).

8. Lewis, B. P., Burge, C. B. & Bartel, D. P. Conserved seed pairing, often

flanked by adenosines, indicates that thousands of human genes are

microRNA targets. Cell 120, 15-20 (2005).

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536

9. Lewis, B. P., Shih, I. H., Jones-Rhoades, M. W., Bartel, D. P. & Burge, C.

B. Prediction of mammalian microRNA targets. Cell 115, 787-98 (2003).

10. Ma, J. B. et al. Structural basis for 5'-end-specific recognition of guide

RNA by the A. fulgidus Piwi protein. Nature 434, 666-70 (2005).

11. Du, T. & Zamore, P. D. microPrimer: the biogenesis and function of

microRNA. Development 132, 4645-52 (2005).

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1536


Recommended