Available online at www.sciencedirect.com
Journal of Genetics and Genomics 39 (2012) 351e360
JGG
ORIGINAL RESEARCHIdentification and Analyses of miRNA Genes in AllotetraploidGossypium hirsutum Fiber Cells Based on the Sequenced Diploid
G. raimondii Genome
Qin Li a, Xiang Jin a, Yu-Xian Zhu a,b,*
a The State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing 100871, ChinabNational Center for Plant Gene Research (Beijing), Beijing 100101, China
Received 13 February 2012; revised 25 April 2012; accepted 25 April 2012
Available online 18 May 2012
ABSTRACT
The plant genome possesses a large number of microRNAs (miRNAs) mainly 21e24 nucleotides in length. They play a vital role inregulation of target gene expression at various stages throughout the whole plant life cycle. Here we sequenced and analyzed w10 millionnon-coding RNAs (ncRNAs) derived from fiber tissue of the allotetraploid cotton (Gossypium hirsutum) 7 days post-anthesis usingncRNA-seq technology. In terms of distinct reads, 24 nt ncRNA is by far the dominant species, followed by 21 nt and 23 nt ncRNAs.Using ab initio prediction, we identified and characterized a total of 562 candidate miRNA gene loci on the recently assembled D5
genome of the diploid cotton G. raimondii. Of all the 562 predicted miRNAs, 22 were previously discovered in cotton species and 187 hadsequence conservation and homology to homologous miRNAs of other plant species. Nucleotide bias analysis showed that the 9th and 1stpositions were significantly conserved among different types of miRNA genes. Among the 463 putative miRNA target genes, mostsignificant up/down-regulation occurred in 10e20 days post-anthesis, indicating that miRNAs played an important role during theelongation and secondary cell wall synthesis stages of cotton fiber development. The discovery of new miRNA genes will help understandthe mechanisms of miRNA generation and regulation in cotton.
KEYWORDS: Cotton; Genome; Micro RNA; Deep sequencing; Microarray
1. INTRODUCTION
Cotton belongs to the genus Gossypium, which contains 5tetraploid (2n ¼ 4x) and more than 45 diploid (2n ¼ 2x)species (Bowers et al., 2003). Upland cotton (Gossypiumhirsutum, AADD, 2n ¼ 4x ¼ 52), constituting over 90% of theworld’s cotton lint production, is thought to have undergonean allopolyploidization event about 1e2 million years ago(MYA), involving both A and D genome species (Wendel andAlbert, 1992). The progenitor of G. raimondii (D5D5,2n ¼ 2x ¼ 26) is considered the contributor of the D sub-
* Corresponding author. Tel: þ86 10 6275 1193.
E-mail address: [email protected] (Y.-X. Zhu).
1673-8527/$ - see front matter Copyright � 2012, Institute of Genetics and Develop
Published by Elsevier Limited and Science Press. All rights reserved.
doi:10.1016/j.jgg.2012.04.008
genome while ancestors of G. arboreum (AA, 2n ¼ 2x ¼26) may contributed the A sub-genome to G. hirsutum(Paterson et al., 2004; Chen et al., 2007).
Cotton fibers are single-cell trichomes differentiated fromovule epidermis. Indexed by the number of days post-anthesis(DPA), fiber development process consists of four distinctivebut overlapping stages: initiation (0e3 DPA), elongation(3e20 DPA), secondary cell wall deposition (15e45 DPA) andmaturation (40e60 DPA) (Kim and Triplett, 2001; Ji et al.,2003). The unicellular structure of cotton fiber cell renders itan ideal model for studying cell elongation, differentiationand also for deciphering the regulatory machinery involvedin cellulose biosynthesis (Qin and Zhu, 2011). Based onmicroarray hybridization of 12,233 Uni-ESTs obtained from
mental Biology, Chinese Academy of Sciences, and Genetics Society of China.
352 Q. Li et al. / Journal of Genetics and Genomics 39 (2012) 351e360
102,000 expressed sequence tags (ESTs) of cotton cDNAlibrary, we previously identified 778 cDNAs that were pref-erentially expressed during the fast fiber-elongation period,with ethylene biosynthesis being the most significantly up-regulated pathway (Shi et al., 2006). Further studies showedthat biosynthesis of very-long-chain fatty acids (VLCFAs),production of reactive oxygen species and pectin polymersmay also be required for fiber cell elongation (Qin et al., 2007;Mei et al., 2009; Pang et al., 2010).
MicroRNAs (miRNAs) are mainly 21e24 nucleotide-longnon-coding RNAs (Bartel, 2004; Baulcombe, 2004; Chapmanand Carrington, 2007). In general, miRNA gene loci ingenome are transcribed by RNA polymerase II into stem-loopprecursors (pre-miRNAs), which are excised by DICER-LIKEprotein in plants (Ambros et al., 2003; Kurihara andWatanabe, 2004). Mature miRNAs were then loaded intoAgonaute complexes for cleavage to produce 21e24nucleotide-long short sequences involved in target mRNAdegradation (Llave et al., 2002; Ambros et al., 2003; Li et al.,2010). Allen et al. (2005) revealed the miRNA-directedphasing of trans-acting siRNA biogenesis and gene regulationin plants. Chuck et al. (2007) reported that a tandem miRNAcaused the heterochronic maize mutant Corngrass1. The 22-nucleotide miR4376 was found to mediate the cleavage of anautoinhibited Ca2þ-ATPase, ACA10, which played a criticalrole in tomato reproductive growth (Wang et al., 2011). Also,miRNA was found to act as translational repressors of APE-TALA2duringArabidopsisflower development (Aukerman andSakai, 2003; Chen, 2004). Qiu et al. (2007) and Zhang et al.(2007) first identified miRNAs and their targets in cotton.Kwak et al. (2009) studied miRNAs expression during cottonfiber development in wild type (WT) and fuzz/lintless ( flmutantin the WT background) mutant using Solexa deep sequencing.
The fundamental feature of plant miRNAs is the preciseexcision of miRNA/miRNA* duplex from the stem of a stem-loop precursor which is predicted from genomic DNA orknown ESTs (Meyers et al., 2008). Due to the deficiencyof complete cotton genome sequences and limited ESTcoverage of non-coding transcriptome, only 34 stem-loopprecursors of G. hirsutum were recorded in miRBase(Griffiths-Jones et al., 2008; www.mirbase.org) comparedwith 291 of Arabidopsis thaliana, 362 of Glycine max, 635 ofMedicago truncatula and 234 of Populus trichocarpa (www.mirbase.org). Kwak et al. (2009) identified 22 conservedcandidate miRNA families including 111 members. UsingncRNA-seq and EST database, Pang et al. (2009) reported 25cotton miRNA precursors, of which 4 were identified for thefirst time. Wang et al. (2011) revealed 7 cotton fiberinitiation-related and 36 novel miRNAs. In this study, usingthe D5 sub-genome sequence as the template, we revealed theexistence of 562 cotton miRNA gene loci with authenticstem-loop structures on genome and expression evidence inribo-genome. We also analyzed the cotton transcriptome overdifferent fiber development stages to find the expressionpatterns of several sets of potential target genes that may beregulated by miRNAs.
2. MATERIALS AND METHODS
2.1. Plant growth and sample preparation
Cotton plants (G. hirsutum cv. Xuzhou 142) were grown ina fully automated walk-in growth rooms. Fiber tissue washarvested 7 days post-anthesis and frozen in liquid nitrogenuntil being used for RNA extraction. Total RNA from cottontissue was extracted using a modified protocol (Ji et al., 2003)excluding polyvinylpyrrolidone in the extraction buffer, whichwould cause RNA degradation.
2.2. Non-coding RNA deep sequencing
Small RNA molecules from 18 to 30 nucleotides in lengthwere amplified and isolated from 15% polyacrylamide gel.The purified sample was used directly for cluster generationand sequencing analysis using the Illumina Genome Analyzer(Illumina, Inc., USA) according to the manufacturer’sinstructions. Raw sequencing data was deposited to NCBIGEO (GSE27697).
2.3. miRNA identification
We first performed genome-wide scanning to detect inver-ted repeat sequences as candidates in the D5 sub-genome. Wethen used the RNAfold program of the Vienna RNA Packageto construct stem-loop structures of the putative pre-miRNAs.The maximal energy for stem-loop structure was set to�18 kCal/mol and at least 19 nucleotides must form pairs withwhat was found in the genome. Less than 3 asymmetric bulgeswere allowed in the secondary structure for miRNA identifi-cation. Candidate pre-miRNAs were mapped with ncRNA-seqdata using mireap (http://sourceforge.net/projects/mireap/) toestimate mature miRNA and miRNA* sequences. At last,candidate miRNAs were aligned with miRNAs from otherplant species in miRBase (Griffiths-Jones et al., 2008; www.mirbase.org) for sequence conservation and homology withmaximum mismatch of 3 nucleotides.
2.4. siRNA identification
Small interfering RNAs (siRNAs) are 22e24 nucleotideslong double-strand ncRNAs. Each strand of siRNA is 2 ntlonger than the other on the 30 end. We aligned raw sequencingreads with each other in pairs to find ncRNAs that satisfy thesecriteria.
2.5. Identification of other non-coding RNAs
For transfer RNAs (tRNAs), ribosomal RNAs (rRNAs),small nuclear RNAs (snRNAs) and small nucleolar RNAs(snoRNAs), we searched against Rfam (http://rfam.sanger.ac.uk) database and NCBI GenBank database using BLASTNsoftware (E -value < 0.01).
353Q. Li et al. / Journal of Genetics and Genomics 39 (2012) 351e360
2.6. Genome mapping of non-coding RNAs
We used CASHX 2.3 software to map ncRNA-seq reads inthe genome of G. raimondii. We used perfect match parametersthat allowed no single mismatch in the sequence in question.Multi position mappings in the genome were treated asdifferent matches. Assembled G. raimondii genome sequenceswere obtained from the cotton genome browser (http://cotton.cbi.pku.edu.cn/).
2.7. miRNA targets prediction
Fig. 1. Length distribution of non-coding RNAs obtained by ncRNA-seq using
small RNA samples extracted from 7 DPA G. hirsutum fiber cells.
All ncRNAs with greater than 26 nucleotides in length (26e30 nt) were
combined because they constitute only a very small percentage.
We used psRNATarget (http://plantgrn.noble.org/psRNATarget/) to predict miRNA targets in upland cotton. MaturemiRNA sequences detected in ncRNA-seq data and supportedby genomic structures were used as queries to map cotton ESTdata with NCBI GenBank accession numbers DR452281eDR463972. The maximum energy allowed to be unpaired withthe target site was set to �25 kCal/mol with at least 20 nucle-otides perfectly mapped in the genomic sequence.
2.8. Analyses of predicted miRNA targets in cDNAmicroarray
We used cDNA microarray data with NCBI GEO accessionnumber GSE2901 to estimate the expression profiles ofpotential miRNA target genes. We used MeV 4.8 software(Saeed et al., 2006) with Hierarchical Clustering method togenerate gene clusters according to their expression profilechanges during different stages of cotton fiber development.
3. RESULTS
3.1. Length distribution of ncRNAs obtained fromG. hirsutum fiber cells
Table 1
Annotation and genome-wide mapping of non-coding RNAs
Type of
molecules
Raw reads Distinct
reads
Total length
(Mb)
% mapped
in the genome
miRNA 568,777 647 12.0 100
siRNA 477,336 88,300 10.6 49.1
To characterize ncRNAs in cotton fiber cells, we sequenceda library that was derived from total RNAs extracted from7 DPA G. hirsutum cv. Xuzhou 142 fibers. We produced11,582,792 raw sequence reads using Illumina GenomeAnalyzer (Table S1). After removal of adaptors sequences,contaminants, polyA sequences and reads shorter than 18nucleotides, we obtained 10,428,441 (90% of raw reads) cleanreads with 18e30 nucleotides in length (Tables S1 and S2).
In cotton fiber cells, the most abundant ncRNAs are 24nucleotides long,which take up48.4%of the total count, followedby 21 nt (14.7%), 23 nt (13.8%) and 22 nt (9.2%) species (Fig. 1and Table S2). In A. thaliana, 24 nt ncRNAs mainly consist ofsmall interfering RNAs (siRNAs) that were associated withrepeats and transposons in the genome (Lu, 2005). The highproportion of 24 nt ncRNA in fiber cells may suggest that siRNAsare also accumulated during rapid cotton fiber elongation.
rRNA 452,024 33,127 9.9 46.8
tRNA 402,235 11,533 0.7 78.4
3.2. Most ncRNAs can bemapped onto theD5 sub-genomesnRNA 1964 1107 0.04 62.2
snoRNA 1548 651 0.03 61.8
The annotated G. hirsutum ncRNAs sequences, includingmiRNAs, siRNAs, rRNAs, tRNAs, snRNAs and snoRNAs,
snoRNAs, were aligned with the recently assembled 775.2-Mb G. raimondii genome sequences. For accuracy, weallowed not even a single mismatch in the sequence. Asa result, 78.4% of the 402,235 annotated tRNAs were map-ped in the genome, followed by snRNAs (62.2%) andsnoRNAs (61.8%) (Table 1), indicating high sequenceconservation for these three ncRNA families between G.hirsutum and G. raimondii. 49.1% of the siRNAs and 46.8%of the rRNAs showed perfect matches in the D5 sub-genome(Table 1). Because all miRNAs were predicted with referenceto the G. raimondii genome to locate the stem-loop structure,they were 100% mapped in the D5 sub-genome (Table 1). Wealso mapped 25 cotton miRNA precursors reported by Panget al. (2009) with the D5 sub-genome sequences and 21matches were found in the genome with identity �99%(Table S3).
3.3. Identification of 562 candidate miRNA loci on D5
sub-genome and 187 homologous miRNA genes
The fundamental feature of plant miRNAs is the preciseexcision of miRNA/miRNA* duplex from the stem of
354 Q. Li et al. / Journal of Genetics and Genomics 39 (2012) 351e360
a stem-loop precursor which is predicted primarily fromgenomic DNA sequences (Meyers et al., 2008). We scannedthe assembled 775.2-Mb D5 sub-genome for inverted repeatsequences as candidates of potential miRNA loci. Thesecandidates were then folded in lowest energy formation toconstruct stem-loop structures of putative pre-miRNAs. Wethen mapped all ncRNA-seq reads (w10 million) onto thestems of the putative stem-loop structures to estimate maturemiRNA and/or miRNA* sequences (Fig. 2 and Table S4). Atotal of 562 miRNA loci, that each of them formed classicalstem-loop structure and was supported by ncRNA-seq attranscription level, were identified from the D5 sub-genome.Among the 562 miRNA loci, 46 were identified withmiRNA* sequences (Table S4), 187 had sequence conserva-tion to homologous miRNAs of other plant species (Table S5)and 8 of them formed into miRNA/miRNA* duplexes withexact 2 nucleotide overhang (Fig. 2).
Fig. 2. Different stem-loop structures of eight pre-miRNAs with both mature miR
Red, mature miRNA sequence; green, miRNA* sequence. The number above o
ncRNA-seq.
3.4. Different miRNAs may have different nucleotidebiases along the sequence
As shown in Fig. 3, high G or C, especially G at the 9thnucleotide position is observed in all four kinds of miRNAsexcept for 21 nt long miRNAs. In 22 and 23 nt miRNAs,greater than 80% of the nucleotides at the 9th position is G orC and in 24 nt miRNAs (Fig. 3). Nucleotide bias at the 9thpositions, which constitute the “tail” of the “seed” sequence,may suggest that nucleotides at this position is important forassociation of miRNA sequences with the Agonaute complex.Usually, the sequence motif including the 2nd to the 9thnucleotides is called the “seed” sequence, which is thecore-binding region of miRNA to the Agonaute complex(Ambros et al., 2003). Also, analyses showed that the 1stnucleotide in 24 nt miRNAs tends to be G/C, whereas in allother miRNAs, this position is more likely occupied by U
NA and miRNA* strands detected in ncRNA-seq data set.
r below each strand represents number of reads obtained for the strand in
Fig. 3. Analyses of nucleotide bias at each position along 21 (A), 22 (B), 23 (C) and 24 nt (D) miRNAs and ncRNAs. , A; , U; , G; , C.
355Q. Li et al. / Journal of Genetics and Genomics 39 (2012) 351e360
(Fig. 3). Nucleotide position 12 in 21-nt miRNAs and position13 in all other three miRNAs are highly conserved with G. Thelast two nucleotide positions in 21 and 23 nt miRNA speciesare often occupied by G, whereas in 22 and 24 nt miRNA
species, these positions are conserved with U or C (in case ofthe last position in 24 nt miRNA). We suggest that thesenucleotide positions might also play important roles duringmiRNA recognition and binding to Agonaute complex.
356 Q. Li et al. / Journal of Genetics and Genomics 39 (2012) 351e360
3.5. Prediction and expression profiling of 463 miRNAtarget genes
Based on the 12,233 Uni-ESTs obtained from sequencingcotton cDNA library, we predicted target genes for 463miRNAs (Table S5). Analysis of cDNA microarray revealedthat miRNA target genes could be clustered into 3 subsets(Fig. 4A and Table 2). Fifty-two target genes show gradual butmild down-regulation from 5 DPA until late in the develop-ment, soon after fiber cell initiation at 0e3 DPA. Theseinclude heat shock proteins (HSP), plastocyanin-like domain-containing proteins and cyclin-dependent kinases (Fig. 4B andTable 2A). Ten target genes, including acyl-CoA oxidase,dynamin-related protein 4 (ADL4) and arabinogalactan-protein (AGP13), showed significant up-regulation in 5e10DPA cotton fiber cells (Fig. 4C and Table 2B). Fourteen targetgenes, including BPG-independent PGAM, harpin-induced
Fig. 4. Expression profiles of 463 potential miRNA target genes detected in
cDNA microarray at different cotton fiber developmental stages.
A: expression profiles of all 463 miRNA target genes in different cotton fiber
development stages. Total RNA samples were extracted from 0 or 3 DPA
cotton ovules, and from 5, 10, 15 and 20 DPA cotton fiber cells. B: clustering
of 52 target genes that showed gradual but mild down-regulations from 5 DPA
until late in the development. C: clustering of 10 target genes that showed
significant up-regulation in 5e10 DPA cotton fiber cells. D: clustering of 14
target genes that showed very strong up-regulation in 10e20 DPA cotton fiber
cells. MeV 4.8 software was used to analyze cotton cDNA microarray data
together with Hierarchical Clustering method.
protein 1 (HIN1), vacuolar Hþ-pyrophosphatase and vacu-olar amino acid efflux transporter, showed very strong up-regulation in 10e20 DPA cotton fiber cells (Fig. 4D andTable 2C), indicating that miRNAs might play an importantrole during the fast elongation and secondary cell wallsynthesis stages of cotton fiber development.
4. DISCUSSION
In this study, we identified 562 candidate miRNA lociincluding 187 homologous miRNA genes on the D5 sub-genome using w10 Mb high-throughput ncRNA-seq data.We analyzed the nucleotide bias among all 4 different types ofmiRNAs. We also found 463 miRNA target genes and char-acterize their expression profiles during cotton fiber develop-ment using a cDNA microarray.
MiRNAs are mainly 21e24 nucleotide-long non-codingRNAs and are naturally divided into 4 different familiesdepending on their lengths (Bartel, 2004; Baulcombe, 2004;Chapman and Carrington, 2007). Since Ambros et al. (2003)published the original guidelines for miRNA annotation inplants and animals, a large number of miRNA genes wereidentified by computational analysis and/or experimentalapproaches (Jones-Rhoades and Bartel, 2004; Lu, 2005;Rajagopalan et al., 2006). To date, over 290 miRNA loci havebeen annotated on the A. thaliana genome and over 4000 havebeen annotated within the plant kingdom (miRBase, 18th,2011 released, www.mirbase.org). However, only a fewmiRNAs have so far been identified in the genus Gossypium,with 34 in G. hirsutum and 40 in all other cotton species,mainly due to the unavailability of complete cotton genome.
Different miRNAs showed various nucleotide biases atdifferent positions along the sequences. It is reported that the50 terminal nucleotide is critical for miRNA binding with theArgonaute complex in Arabidopsis (Mi et al., 2008). In case ofcotton, 21e23 nt miRNAs are strongly 50 U biased while the24 nt miRNAs is 50 G biased. It seems to suggest that differentAGO proteins prefer to bind with different miRNAs by asso-ciating with their 50 terminal nucleotides. Unlike the firstnucleotide, the 9th nucleotide was G biased among all threekinds of miRNAs detected by ncRNA-seq in cotton fiber cells.In animals, it is reported that the “seed” region, usually bases2nde9th of the miRNA, is the region for miRNAeArgonauteassociation (Lewis et al., 2003). Future studies such asimmuno precipitation experiments combined with deepsequencing technology will reveal the relationship betweenAGO proteins and miRNAs in the formation of Argonautecomplexes in cotton.
With the D5 sub-genome on hand, we were able to find 562cotton miRNA loci that formed authentic stem-loop structures.In M. truncatula, a diploid species with haploid genome sizeof 550 Mb, 635 miRNA loci were annotated. In Oryza sativa,a haploid genome size of 358 Mb, 581 miRNA loci werefound. In the 125 Mb A. thaliana, 291 miRNA loci wereidentified (miRBase, 18th, 2011 released, www.mirbase.org).Since miRNA families were believed to generate via geneduplication mechanisms common to large gene family
Table 2
miRNA target genes that were expressed preferentially at 0e3 (A), 5e10 (B) and 15e20 (C) DPA
Array ID P-value Annotation
A
CM104F05 0 F21D18.7 [Arabidopsis thaliana]
CM109D04 1.57E-06 Unknown [Arabidopsis thaliana]
CM121C09 7.57E-06 Cytokinin receptor CRE1a [Arabidopsis thaliana]
CM111C09 1.08E-05 Cyclin-dependent kinase [Populus tremula� Populus tremuloides]
CM118D06 1.49E-05 Transducin/WD-40 repeat protein family [Arabidopsis thaliana]
CM078E04 1.53E-05 Cellulose synthase-like protein D4 [Populus tremuloides]
CM043E04 1.62E-05 Leaf development protein Argonaute [Arabidopsis thaliana]
CM120D01 1.68E-05 RS16 protein, 40S subunit [Gossypium hirsutum]
CM087A09 1.92E-05 Unknown
CM028A11 1.98E-05 60S ribosomal protein L27 (RPL27C) [Arabidopsis thaliana]
CM078F06 2.00E-05 Unknown
CM107H12 2.19E-05 Hypothetical protein [Arabidopsis thaliana]
CM083F12 2.46E-05 Mini-chromosome maintenance protein MCM6 [Pisum sativum]
CM092H03 3.41E-05 Expressed protein [Arabidopsis thaliana]
CM023E09 4.00E-05 Unknown
CM052A11 4.11E-05 AT4g22540/F7K2_120 [Arabidopsis thaliana]
CM117H07 5.55E-05 DNA topoisomerase [ATP-hydrolyzing] [Arabidopsis thaliana]
CM095C09 6.00E-05 Putative BURP domain-containing protein [Arabidopsis thaliana]
CM120A10 6.08E-05 KH domain protein [Arabidopsis thaliana]
CM112D06 6.29E-05 Small nuclear ribonucleoprotein U1A [Arabidopsis thaliana]
CM123A08 6.70E-05 Similar to Calphotin CG4795-PA [Homo sapiens]
CM101C08 7.95E-05 Unknown
CM036A05 9.16E-05 Histone H3 [Arabidopsis thaliana]
CM082A08 9.63E-05 Unknown
CM038E01 9.90E-05 Putative RNA-binding protein [Arabidopsis thaliana]
CM080A10 1.04E-04 Expressed protein [Arabidopsis thaliana]
CM066D03 1.14E-04 RNA-binding like protein [Arabidopsis thaliana]
CM089G07 1.15E-04 AT5g62000/mtg10_20 [Arabidopsis thaliana]
CM003F09 1.20E-04 Heavy-metal-associated domain-containing protein
CM043C11 1.30E-04 Beta-galactosidase like protein [Arabidopsis thaliana]
CM090H08 1.41E-04 Putative protein [Arabidopsis thaliana]
CM118E02 1.44E-04 Vacuolar protein sorting protein [Arabidopsis thaliana]
CM077H06 1.78E-04 Invertase-like protein [Arabidopsis thaliana]
CM031G10 2.27E-04 Unknown
CM005A01 2.75E-04 RNA and export factor binding protein [Arabidopsis thaliana]
CM078A12 2.85E-04 ADP-glucose synthase
CM042F11 3.16E-04 Expressed protein [Arabidopsis thaliana]
CM065H06 3.42E-04 Unknown
CM105F12 3.59E-04 Leucine-rich repeat transmembrane protein kinase
CM117D01 3.92E-04 Unknown
CM026F03 4.08E-04 DEAD box RNA helicase, putative [Arabidopsis thaliana]
CM092G12 4.40E-04 ENSANGP00000004655 [Anopheles gambiae]
CM015F04 6.76E-04 Lon protease homolog 2 precursor [Arabidopsis thaliana]
CM107B03 7.67E-04 Unknown
(continued on next page)
357Q. Li et al. / Journal of Genetics and Genomics 39 (2012) 351e360
Table 2 (continued)
Array ID P-value Annotation
CM105F08 9.24E-04 Unknown protein [Arabidopsis thaliana]
CM105H07 1.10E-03 (Psrp-6)-like [Arabidopsis thaliana]
CM099H09 1.12E-03 T13D8.31 [Arabidopsis thaliana]
CM086H08 1.17E-03 Endochitinase 1 precursor
CM103B12 1.74E-03 Expressed protein [Arabidopsis thaliana]
CM108D12 2.67E-03 Expressed protein [Arabidopsis thaliana]
CM057B01 2.76E-03 bHLH protein family [Arabidopsis thaliana]
CM102F01 3.49E-03 Expressed protein [Arabidopsis thaliana]
B
CM042B04 0 acyl-CoA oxidase [Arabidopsis thaliana]
CM052B01 1.20E-06 At1g76250 [Arabidopsis thaliana]
CM109A09 2.75E-05 Dynamin-related protein 4 (ADL4) [Arabidopsis thaliana]
CM058H01 1.20E-04 Unknown protein [Arabidopsis thaliana]
CM098F12 1.74E-04 Unknown protein [Arabidopsis thaliana]
CM046C04 6.48E-04 At5g23720 [Arabidopsis thaliana]
CM045C02 8.38E-04 Arabinogalactan-protein (AGP13) [Arabidopsis thaliana]
CM039G06 8.89E-04 Similarity to zinc metalloproteinase [Arabidopsis thaliana]
CM054D04 8.89E-04 At5g50010 [Arabidopsis thaliana]
CM064H12 1.10E-03 OSJNBb0039L24.13 [Oryza sativa ( japonica cultivar-group)]
C
CM081F05 0 Unknown
CM039A01 7.61E-06 Cytochrome P450 [Pyrus communis]
CM031D08 8.90E-06 Ubiquitin family [Arabidopsis thaliana]
CM001H03 1.95E-05 Alcohol dehydrogenase (ADH) [Arabidopsis thaliana]
CM071D07 2.62E-05 Cytochrome P450 [Citrus sinensis]
CM113B01 5.12E-05 Unknown
CM002H09 6.28E-05 BPG-independent PGAM
CM050H05 7.24E-05 Harpin-induced protein 1 family (HIN1) [Arabidopsis thaliana]
CM026F06 1.01E-04 OSJNBb0035I14.15 [Oryza sativa ( japonica cultivar-group)]
CM023H02 1.91E-04 Vacuolar Hþ-pyrophosphatase [Prunus persica]
CM066B04 1.92E-04 Expressed protein [Arabidopsis thaliana]
CM055C02 4.93E-04 Branched-chain alpha-keto acid decarboxylase E1 beta subunit
CM024B11 6.17E-04 Kþ efflux antiporter [Arabidopsis thaliana]
CM023A03 1.40E-03 Nodulin MtN21 family protein [Arabidopsis thaliana]
P-value < 0.005.
358 Q. Li et al. / Journal of Genetics and Genomics 39 (2012) 351e360
formation (Elemento et al., 2002), the number of miRNA lociobtained from the 880 Mb D5 sub-genome is comparablewith other plant species. Because similar whole genomeduplication (WGD) events were suggested to happen indifferent Gossypium species before the polyploidization of Aand D sub-genomes (Senchina et al., 2003; Blanc and Wolfe,2004; Fawcett et al., 2009), we predict that there will be moremiRNA loci on the larger A sub-genome (w1.7 Gb of haploidG. arboretum, Hendrix and Stewart, 2005) and the allotetra-ploid G. hirsutum genome might contain more than 1000miRNA loci.
The expression pattern of miRNAs varies greatly amongtissues and development stages in cotton (Kwak et al., 2009;
Pang et al., 2009; Wang et al., 2011). So using mixture ofcotton fibers in different development stages, or even differenttissues would be helpful for better profiling of cotton miRNAs.
A recent study predicted 223 targets of cotton miRNAsfrom the expressed sequence tags derived mainly from cottonfibers and ovules in the early stages of fiber development (�3DPA to 7 DPA) (Pang et al., 2009). They pointed out thatmany miRNAs accumulating at lower levels in fibers (7 DPA)and fiber-bearing ovules (3 DPA) than immature ovules (�3DPA) may active the expression of target genes, such asNAM-like, ARF3/ARF4-like, Class III HA-zip protein-likeand TAS3-like, which were required for fiber cell elongation.Accordingly, we reported 24 miRNA target genes that were
359Q. Li et al. / Journal of Genetics and Genomics 39 (2012) 351e360
strongly up-regulated either in 5e10 DPA or 10e20 DPAcotton ovules (Table 2). Some of the potential miRNA targetgenes were previously known to be required for fiber devel-opment. For example, the harpin-induced protein 1 (HIN1)was involved in multiple signaling pathways including sali-cylic acid, jasmonic acid and ethylene in cotton (Miao et al.,2010). The vacuolar Hþ-pyrophosphatase (AVP1) gene wasable to increase fiber yield in field conditions (Pasapula et al.,2011). The dynamin-related protein, which is a GTP bindingprotein required for membrane trafficking (Kang et al., 2003),was associated with cotton fiber development in a chromo-somal substitution line (CS-B22sh) (Wu et al., 2008). Theseresults collectively suggest that miRNAs may serve as nega-tive regulators of target genes mainly in the elongation andsecondary cell wall synthesis stages and are important for thequality of fiber cells.
SUPPLEMENTARY DATA
Table S1. Deep sequencing and primary analyses of G.hirsutum ncRNAs.
Table S2. Length distribution of ncRNAs detected by deepsequencing.
Table S3. Sequence alignment of 21 cotton miRNAprecursors with D5 sub-genome.
Table S4. 562 miRNA precursors predicted on D sub-genome sequences using ncRNA-seq.
Table S5. Expression analyses, ortholog classificationand target prediction of strand-specific and strand-unspecificmiRNAs.
Supplementary data associated with this article can befound, in the online version, at doi:10.1016/j.jgg.2012.04.008.
REFERENCES
Allen, E., Xie, Z., Gustafson, A.M., Carrington, J.C., 2005. microRNA-
directed phasing during trans-acting siRNA biogenesis in plants. Cell 121,
207e221.
Ambros, V., Bartel, B., Bartel, D.P., Burge, C.B., Carrington, J.C., Chen, X.,
Dreyfuss, G., Eddy, S.R., Griffiths-Jones, S., Marshall, M., Matzke, M.,
Ruvkun, G., Tuschl, T., 2003. A uniform system for microRNA annotation.
RNA 9, 277e279.
Aukerman, M.J., Sakai, H., 2003. Regulation of flowering time and floral
organ identity by a microRNA and its APETALA2-like target genes. Plant
Cell 15, 2730e2741.
Bartel, D., 2004. MicroRNAs: genomics, biogenesis, mechanism, and func-
tion. Cell 116, 281e297.
Baulcombe, D., 2004. RNA silencing in plants. Nature 431, 356e363.Blanc, G., Wolfe, K.H., 2004. Widespread paleopolyploidy in model plant
species inferred from age distributions of duplicate genes. Plant Cell 16,
1667e1678.
Bowers, J.E., Chapman, B.A., Rong, J., Paterson, A.H., 2003. Unraveling
angiosperm genome evolution by phylogenetic analysis of chromosomal
duplication events. Nature 422, 433e438.
Chapman, E.J., Carrington, J.C., 2007. Specialization and evolution of
endogenous small RNA pathways. Nat. Rev. Genet. 8, 884e896.Chen, X., 2004. A microRNA as a translational repressor of APETALA2 in
Arabidopsis flower development. Science 303, 2022e2025.
Chen, Z.J., Scheffler, B.E., Dennis, E., Triplett, B.A., Zhang, T., Guo, W.,
Chen, X., Stelly, D.M., Rabinowicz, P.D., Town, C.D., Arioli, T.,
Brubaker, C., Cantrell, R.G., Lacape, J.M., Ulloa, M., Chee, P.,
Gingle, A.R., Haigler, C.H., Percy, R., Saha, S., Wilkins, T., Wright, R.J.,
Van Deynze, A., Zhu, Y., Yu, S., Abdurakhmonov, I., Katageri, I.,
Kumar, P.A., Mehboob Ur, R., Zafar, Y., Yu, J.Z., Kohel, R.J., Wendel, J.F.,
Paterson, A.H., 2007. Toward sequencing cotton (Gossypium) genomes.
Plant Physiol. 145, 1303e1310.
Chuck, G., Cigan, A.M., Saeteurn, K., Hake, S., 2007. The heterochronic
maize mutant Corngrass1 results from overexpression of a tandem
microRNA. Nat. Genet. 39, 544e549.
Elemento, O., Gascuel, O., Lefranc, M.P., 2002. Reconstructing the
duplication history of tandemly repeated genes. Mol. Biol. Evol. 19,
278e288.Fawcett, J.A., Maere, S., Van de Peer, Y., 2009. Plants with double genomes
might have had a better chance to survive the Cretaceous-Tertiary
extinction event. Proc. Natl. Acad. Sci. USA 106, 5737e5742.
Griffiths-Jones, S., Saini, H.K., van Dongen, S., Enright, A.J., 2008. miRBase:
tools for microRNA genomics. Nucleic Acids Res. 36, D154eD158.
Hendrix, B., Stewart, J.M., 2005. Estimation of the nuclear DNA content of
Gossypium species. Ann. Bot. 95, 789e797.
Ji, S.-J., Lu, Y.-C., Feng, J.-X., Wei, G., Li, J., Shi, Y.-H., Fu, Q., Liu, D.,
Luo, J.-C., Zhu, Y.-X., 2003. Isolation and analyses of genes preferentially
expressed during early cotton fiber development by subtractive PCR and
cDNA array. Nucleic Acids Res. 31, 2534e2543.Jones-Rhoades, M.W., Bartel, D.P., 2004. Computational identification of
plant microRNAs and their targets, including a stress-induced miRNA.
Mol. Cell 14, 787e799.
Kang, B.-H., Busse, J.S., Bednarek, S.Y., 2003. Members of the Arabidopsis
dynamin-like gene family, ADL1, are essential for plant cytokinesis and
polarized cell growth. Plant Cell 15, 899e913.
Kim, H.J., Triplett, B.A., 2001. Cotton fiber growth in planta and in vitro.
Models for plant cell elongation and cell wall biogenesis. Plant Physiol.
127, 1361e1366.
Kurihara, Y., Watanabe, Y., 2004. Arabidopsis micro-RNA biogenesis through
Dicer-like 1protein functions. Proc.Natl.Acad. Sci.USA101,12753e12758.Kwak, P.B., Wang, Q.-Q., Chen, X.-S., Qiu, C.-X., Yang, Z.-M., 2009.
Enrichment of a set of microRNAs during the cotton fiber development.
BMC Genomics 10, 457.
Lewis, B.P., Shih, I.-h., Jones-Rhoades, M.W., Bartel, D.P., Burge, C.B., 2003.
Prediction of mammalian microRNA targets. Cell 115, 787e798.
Li, Y., Liu, X., Huang, L., Guo, H., Wang, X.-J., 2010. Potential coexistence of
both bacterial and eukaryotic small RNA biogenesis and functional related
protein homologs in Archaea. J. Genet. Genomics 37, 493e503.Llave, C., Xie, Z., Kasschau, K.D., Carrington, J.C., 2002. Cleavage of
scarecrow-like mRNA targets directed by a class of Arabidopsis miRNA.
Science 297, 2053e2056.Lu, C., 2005. Elucidation of the small RNA component of the transcriptome.
Science 309, 1567e1569.
Mei, W.-Q., Qin, Y.-M., Song, W.-Q., Li, J., Zhu, Y.-X., 2009. Cotton GhPOX1
encoding plant class III peroxidase may be responsible for the high level of
reactive oxygen species production that is related to cotton fiber elonga-
tion. J. Genet. Genomics 36, 141e150.
Meyers, B.C., Axtell, M.J., Bartel, B., Bartel, D.P., Baulcombe, D.,
Bowman, J.L., Cao, X., Carrington, J.C., Chen, X., Green, P.J., Griffiths-
Jones, S., Jacobsen, S.E., Mallory, A.C., Martienssen, R.A., Poethig, R.S.,
Qi, Y., Vaucheret, H., Voinnet, O.,Watanabe, Y.,Weigel, D., Zhu, J.K., 2008.
Criteria for annotation of plant microRNAs. Plant Cell 20, 3186e3190.Mi, S., Cai, T., Hu, Y., Chen, Y., Hodges, E., Ni, F., Wu, L., Li, S., Zhou, H.,
Long, C., Chen, S., Hannon, G.J., Qi, Y., 2008. Sorting of small RNAs into
Arabidopsis argonaute complexes is directed by the 50 terminal nucleotide.
Cell 133, 116e127.Miao, W., Wang, X., Song, C., Wang, Y., Ren, Y., Wang, J., 2010. Tran-
scriptome analysis of Hpa1xoo transformed cotton revealed constitutive
expression of genes in multiple signaling pathways related to disease
resistance. J. Exp. Bot. 61, 4263e4275.Pang, C.-Y., Wang, H., Pang, Y., Xu, C., Jiao, Y., Qin, Y.-M., Western, T.L.,
Yu, S.-X., Zhu, Y.-X., 2010. Comparative proteomics indicates that
biosynthesis of pectic precursors is important for cotton fiber and Arabi-
dopsis root hair elongation. Mol. Cell. Proteomics 9, 2019e2033.
360 Q. Li et al. / Journal of Genetics and Genomics 39 (2012) 351e360
Pang, M., Woodward, A.W., Agarwal, V., Guan, X., Ha, M.,
Ramachandran, V., Chen, X., Triplett, B.A., Stelly, D.M., Chen, Z.J., 2009.
Genome-wide analysis reveals rapid and dynamic changes in miRNA and
siRNA sequence and expression during ovule and fiber development in
allotetraploid cotton (Gossypium hirsutum L.). Genome Biol. 10, R122.
Pasapula, V., Shen, G., Kuppu, S., Paez-Valencia, J., Mendoza, M., Hou, P.,
Chen, J., Qiu, X., Zhu, L., Zhang, X., Auld, D., Blumwald, E., Zhang, H.,
Gaxiola, R., Payton, P., 2011. Expression of an Arabidopsis vacuolar Hþ-pyrophosphatase gene (AVP1) in cotton improves drought- and salt
tolerance and increases fibre yield in the field conditions. Plant Biotechnol.
J. 9, 88e99.
Paterson, A.H., Bowers, J.E., Chapman, B.A., 2004. Ancient polyploidization
predating divergence of the cereals, and its consequences for comparative
genomics. Proc. Natl. Acad. Sci. USA 101, 9903e9908.
Qin, Y.-M., Zhu, Y.-X., 2011. How cotton fibers elongate: a tale of linear cell-
growth mode. Curr. Opin. Plant Biol. 14, 106e111.Qin, Y.-M., Hu, C.-Y., Pang, Y., Kastaniotis, A.J., Hiltunen, J.K., Zhu, Y.-X.,
2007. Saturated very-long-chain fatty acids promote cotton fiber and
Arabidopsis cell elongation by activating ethylene biosynthesis. Plant Cell
19, 3692e3704.
Qiu, C.-X., Xie, F.-L., Zhu, Y.-Y., Guo, K., Huang, S.-Q., Nie, L., Yang, Z.-M.,
2007. Computational identification of microRNAs and their targets in
Gossypium hirsutum expressed sequence tags. Gene 395 (1e2), 49e61.Rajagopalan, R., Vaucheret, H., Trejo, J., Bartel, D.P., 2006. A diverse and
evolutionarily fluid set of microRNAs in Arabidopsis thaliana. Genes Dev.
20, 3407e3425.
Saeed, A.I., Bhagabati, N.K., Braisted, J.C., Liang, W., Sharov, V.,
Howe, E.A., Li, J., Thiagarajan, M., White, J.A., Quackenbush, J., 2006.
TM4 microarray software suite. Methods Enzymol. 411, 134e193.
Senchina, D.S., Alvarez, I., Cronn, R.C., Liu, B., Rong, J., Noyes, R.D.,
Paterson, A.H., Wing, R.A., Wilkins, T.A., Wendel, J.F., 2003. Rate vari-
ation among nuclear genes and the age of polyploidy in Gossypium. Mol.
Biol. Evol. 20, 633e643.
Shi, Y.-H., Zhu, S.-W., Mao, X.-Z., Feng, J.-X., Qin, Y.-M., Zhang, L.,
Cheng, J., Wei, L.-P., Wang, Z.-Y., Zhu, Y.-X., 2006. Transcriptome
profiling, molecular biological, and physiological studies reveal
a major role for ethylene in cotton fiber cell elongation. Plant Cell 18,
651e664.
Wang, Y., Itaya, A., Zhong, X., Wu, Y., Zhang, J., van der Knaap, E.,
Olmstead, R., Qi, Y., Ding, B., 2011. Function and evolution of a micro-
RNA that regulates a Ca2þ-ATPase and triggers the formation of phased
small interfering RNAs in tomato reproductive growth. Plant Cell 23,
3185e3203.
Wang, Z.-M., Xue, W., Dong, C.-J., Jin, L.-G., Bian, S.-M., Wang, C., Wu, X.-Y.,
Liu, J.-Y., 2011. A comparative miRNAome analysis reveals seven fiber
initiation-related and 36 novel miRNAs in developing cotton ovules. Mol.
Plant. doi:10.1093/mp/ssr094.
Wendel, J., Albert, V., 1992. Phylogenetics of the cotton genus (Gossypium) echaracter-state weighted parsimony analysis of chloroplast-DNA restric-
tion site data and its systematic and biogeographic implications. System.
Bot. 17, 115e143.
Wu, Z., Soliman, K.M., Bolton, J.J., Saba, S., Jenkins, J.N., 2008. Identifi-
cation of differentially expressed genes associated with cotton fiber
development in a chromosomal substitution line (CS-B22sh). Funct. Integr.
Genomics 8, 165e174.
Zhang, B., Wang, Q., Wang, K., Pan, X., Liu, F., Guo, T., Cobb, G.P.,
Anderson, T.A., 2007. Identification of cotton microRNAs and their
targets. Gene 397 (1e2), 26e37.