Post on 09-Jun-2020
transcript
Transcriptome analysis of drough-tolerant CAM plants,Agave deserti and Agave tequilanaStephen M. Gross1,2, Jeffrey A. Martin1,2, June Simpson3, Zhong Wang1,2, and Axel Visel1,2
1. DOE Joint Genome Institute, Walnut Creek, CA2. Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA3. CINVESTAV, Irapuato, MX
Agaves are succulent monocotyledonous plants native to hot and arid environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis) and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits. Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, we use RNA-seq data to gain insights into biological functions along the A. deserti juvenile leaf proximal-distal axis. Our work presents a foundation for further investigation of agave biology and their improvement for bioenergy development.
ABSTRACT
Cinvestav
BLAST
A. deserti A. tequilana
29482835 13,388
OrthoMCLone-to-one RBH protein comparison protein family comparison
20,37720,161 14,709
A. deserti A. tequilana
OrthoMCL
B. distachyon16618
O. sativa19643
S. bicolor182181821818218821882118218
Z. mays20681
A. deserti16223
A. tequilana16336
A
C
B
461
315782
755129
503199
2086
13
5512
7
1326
233
5
21
1531
12593
8929
46 16
111
787
37511108
2735
161362
575
3718
33
1789
14
7020
26
16
936
114
3
27
1329
20748
4323
35
81
23
15135
11
46176
108 8144
Relative composite RPKM valuenormalized to section with highest expression
0.0 0.2 0.4 0.6 0.8 1.0
transcription factorshorm
ones
1 2 3 4 1 2 3 4
cell wall &
stomata
development
Cell wall biosynthesis
Cellulose biosynthesis
Lignin biosynthesis
Stomata development
Cutin & suberin biosynthesis
photosynthesis
Antenna proteins
Photosystem II
Photosystem I
Cytochrome b6f & ATP synthase
Calvin cycle
C4 dicarboxylic acid cycles
dark
light CA
M
Chlorophyll biosynthesis Class I
Class II
MADS-box
GRAS
YABBY
MYB
bHLH
Zn finger
KN
OX
Au
xin biosynthesis
transport
CK biosynthesis
GA
biosynthesis
ET
H
biosynthesis
BR biosynthesis
AB
A
biosynthesis
7964
5 of
887
18 lo
ci c
lust
ered
photosynthesisregulation of
gene expressiontranslationcellular protein
modificationDNA
metabolismvesicle-mediated
transport
Cluster E
1 2 3 4
17579 loci
Cluster F
1 2 3 4
8426 loci
Distal Expression
0
1
z
Cluster A
1 2 3 4
22249 loci
Cluster B
1 2 3 4
12063 loci
Cluster C
1 2 3 4
11789 loci
Cluster D
1 2 3 4
7539 loci
3 421proximal(base)
distal(tip)
D
A
B
C
Number of reads
Pro
babi
lity
of o
bser
ving
a u
niqu
e 25
-mer
contig GC content
dens
ity
0
1
2
3
4
5
6
0.0 0.2 0.4 0.6 0.8 1.0
A. tequilana contigs
removed contigs
contig GC content
0
2
4
6
0.0 0.2 0.4 0.6 0.8 1.0
A. deserti contigs
removed contigs
0.0
0.2
0.4
0.6
0.8
1.0
1 100 10000
non-coding
coding
A. deserti
Locus RPKMTranscript length (nt)
0.0
0.3
0.6
0.9
1.2
100 1000 10000
dens
ity
A. deserti
A. tequilana
Locus RPKM
0.0
0.5
1.0
1.5
2.0
1 100 10000
A. tequilana
non-coding
coding
dens
ity
0.2
0.4
0.6
0.8
210 215 220 225 230
A. desertiA. tequilana
C D
A B
E F
Agave tequilana Agave deserti
0
1000
2000
0
10
20
30
0
2000
4000
6000
PacBio subreads || Rnnotator
Rnnotator contig contains
PacBio subread
Pacbio subreadcontains
Rnnotator contig
Sequences
overlap
unaligned
Num
ber o
f tra
nscr
ipts
n = 47672683
862
1221
1
GenBank || Rnnotator
Rnnotator contig contains
GenBank sequence
GenBank sequence contains
Rnnotator contig
Sequences
overlap
unaligned
n = 82
1
38
14
29
A. deserti || A. tequilanaA. tequilana || A. deserti
A. deserti contains
A. tequilana
A. tequilana contains
A. deserti
Sequences
overlap
unaligned
0
10,000
20,000
30,000
40,000
50,000
0
30,000
60,000
90,000
A. deserti contains
A. tequilana
A. tequilana contains
A. deserti
Sequences
overlap
unalignedFraction of A. tequilana transcript
aligning to A. deserti transcript
0
20
30
0.0 0.3 0.6 0.9
Num
ber
of A
. teq
uila
na tr
asnc
ripts
(th
ousa
nds)
a
Fraction of A. deserti transcriptaligning to A. tequilana transcript
0
10
20
30
40
50
Num
ber
ofA
. des
erti
tran
scrip
ts (
thou
sand
s)
0.0 0.6 0.90.3
A. deserti || A. tequilanaA. tequilana || A. deserti
Num
ber o
f tra
nscr
ipts
(tho
usan
ds)
28,627
44,443
107,821
23,649
n = 128,959
32,231
37,241
52,109
7,378
n = 204,530
10
McKain et al. || Rnnotator
n = 12,972
Overlap:
Containment:
Class descriptions
indels
no indels
unaligned
Rnnotator contig contains
McKain et al. sequence
McKain et al. contig contains
Rnnotator contig
Sequences
overlap
unaligned
6578
2710
3560
124
A
B C
AGAVE TRANSCRIPTOME ASSEMBLIES FROM DEEP RNA-seq
COMPARISON OF AGAVE DE NOVO ASSEMBLIES
FIGURE 3: Comparison of the de novo Agave transcriptome assemblies
(A) Comparisons of the A. tequilana de novo Rnnotator assembly to error corrected Pacific Biosciences subreads, 82 GenBank A. tequilana sequences, and an additional A. tequilana dataset from McKain et al. 2012. [4]
(B) Comparisons between the A. tequilana and A. deserti de novo Rnnotator assemblies.
(C) Histograms of the fraction of aligned sequence lengths between A. deserti and A. tequilana.
Symbol || separates query sequence dataset from subject sequence dataset. Total number of sequences (n) is noted in each bar chart, total number of sequences in alignment classes are noted above bar.
FIGURE 4: Proteomic comparison of agaves to other plant species
(A) Venn diagram of BLASTP-based one-to-one reciprocal best hit proteins shared between A. deserti and A. tequilana.
(B) Venn diagram of OrthoMCL-defined protein families shared between agaves.
(C) Edwards-Venn diagram of OrthoMCL-defined plant orthologous-group protein families (Plant OGs) shared between agave and 4 additional monocotyledonous plant species. Shape and color used for each species is at the right with the total number of Plant OGs within each species.
FIGURE 5: Transcriptomic analysis of the A. deserti leaf proximal-distal axis.
(A) One of the A. deserti leaves used for analysis, indicating proximal-distal (PD) sections 1–4.
(B) Six major K-means clusters of gene expression along the PD axis. Clusters are manually grouped by highest expression in proximal, medial, or distal tissues. Blue lines connect mean z-scaled RPKM values, shaded areas represent the 25th and 75th percentiles, red lines indicate standard error at each mean. Green text beneath each cluster denotes the description of the most significantly enriched GO term in each cluster.
(C, D) Heatmaps of composite gene expression for indicated biological processes along the leaf PD axis.
PROTEOMIC ANALYSES SUPPORT COMPREHENSIVE AGAVE TRANSCRIPTOME ASSEMBLIES
PROFILING OF THE A. DESERTI LEAF HIGHLIGHTS REGIONS CRITICAL TO DEVELOPMENT AND PHOTOSYNTHESIS
FIGURE 2: A. tequilana, A. deserti, and their respective transcriptomes
(A) Cultivated A. tequilana in Jalisco, Mexico.
(B) A. deserti (foreground) in natural habitat, Riverside County, California, USA.
(C) Plot of the fraction of unique 25-mers over indicated read depth (log2 scale).
(D) Density plot of GC content of agave transcript contigs vs. contigs from contamination and commensal organisms.
(E) Density plots of A. deserti and A. tequilana transcript lengths. Note log10 scale. Peaks at 150 and 250 nt represent single reads or paired-end reads, respectively, that were not assembled into larger contigs.
(F) Density plot of locus RPKM values for coding (dark shading) and non-coding (light shading) loci.
Species
A. tequilana
A. deserti
TotalSequencing293.5 Gbp
184.7 Gbp
No. of loci
139,525
88,718
No. transcriptcontigs204,530
128,869
N50 length
1387 bp
1323 bp
Sumassembled length
204.9 Mbp
125.0 Mbp
No. protein-codingloci
34,870
35,086
OVERVIEW OF AGAVE TRANSCRIPTOME ASSEMBLIES
CAM PHOTOSYNTHESIS, ARID ENVIRONMENTS, AND BIOENERGYAgave species are adapted to their native habitat in arid regions of Mexico and the United States. Agave thus holds promise as a biofuel feedstock [1,2], capable of growing on marginal lands where other proposed bioenergy plants cannot. The ability of agaves to withstand hot and arid conditions relies upon crassulacean acid metabolism (CAM)—a specialized form of photosynthesis allowing agaves to keep leaf stomata (pores) closed during the hot day, minimizing water loss through evapotranspiration.
A
sugar
CO2
C4
C3
CO2
C3
C4
CalvinCyclelight
vacuolechloroplast
NIGHT
DAY
B
C
Agave
Semi-arid regions
FIGURE 1: Agaves and CAM biology
(A) Agave tequilana cultivated in Mexico.
(B) Semi-arid regions of the United States (brown) are unsuitable for cultivation of other bioenergy plants, which require more temperate regions (green). Most Agave species are adapted to semi-arid regions in Mexico and the extreme southwestern USA (purple).
(C) Crassulacean Acid Metabolism (CAM). CO2enters plant cells at night, joins with a 3-carbon molecule (C3) and is stored in the vacuole as a 4-carbon molecule (C4). During the day, C4molecules diffuse out of the vacuole, and CO2 is relased and assimilated into sugar in the chloroplast.
Comparison of inputs (water and nitrogen) and outputs (biomass and ethanol) of agaves and other biofuel feedstock species. Though agaves are harvested at several years of age, their annualized growth rate is on par with Miscanthus. Table is modified from reference [2].
Corn grainCorn stoverMiscanthusPoplar coppiceAgave spp.
75–12070–10530–80
50–80 low
lowmoderate
high
0–150–500–12
90–120 7–103–6
15–405–11
10–34
2900900
4600–12,4001500–3400
3000–10,500
Feedstock Water(cm yr-1)
Droughttolerance
Nitrogen(kg ha-1 yr-1)
Dry biomass(Mg ha-1 yr-1)
Ethanol(liters yr-1)
Inputs Outputs
This work performed at the U.S. Department of Energy Joint Genome Institute was supported in part by the Office of Science of the U.S. Department of Energy under contract DE-AC02-05CH112.
[1] Davis, A. S. et al. The global potential for Agave as a biofuel feedstock. GCB Bioenergy 3, 68–78, (2011).[2] Somerville, C. et al. Feedstocks for lignocellulosic biofuels. Science 329, 790-2, (2010).[3] Martin, J. et al. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-seq reads. BMC
Genomics 11, 663, (2010).[4] McKain, M. et al. Phylogenomic analysis of transcriptome data elucidates co-occurrence of a paleopolyploid event and the origin
of bimodal karyotates in Agavoideae (Asparagaceae). Am J Bot 99:2, 397–406.
To provide sequence resources for the Agave research community, we built de novo transcriptomes of Agave tequilana and Agave deserti from deep Illumina RNA-seq data. Sequences were assembled by Rnnotator [3], a de novo transcriptome assembly pipeline.
ACKNOWLEDGEMENTS AND CITATIONS
Analysis of assembled contigs suggest the Agave de novo assemblies are comprehensive and accurate.
Proteome comparisons between Agave species and additional monocot species suggest the majority of Agave proteins are conserved across taxa. We can also identify protein families specific to agaves.
Agaves spend the majority of their lives as compact rosettes, thus leaves are important organs in which to study Agave developmental and bioenergetic processes.