Transcriptome analysis of drough-tolerant CAM plants...

Post on 09-Jun-2020

1 views 0 download

transcript

Transcriptome analysis of drough-tolerant CAM plants,Agave deserti and Agave tequilanaStephen M. Gross1,2, Jeffrey A. Martin1,2, June Simpson3, Zhong Wang1,2, and Axel Visel1,2

1. DOE Joint Genome Institute, Walnut Creek, CA2. Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA3. CINVESTAV, Irapuato, MX

Agaves are succulent monocotyledonous plants native to hot and arid environments of North America. Because of their adaptations to their environment, including crassulacean acid metabolism (CAM, a water-efficient form of photosynthesis) and existing technologies for ethanol production, agaves have gained attention both as potential lignocellulosic bioenergy feedstocks and models for exploring plant responses to abiotic stress. However, the lack of comprehensive Agave sequence datasets limits the scope of investigations into the molecular-genetic basis of Agave traits. Here, we present comprehensive, high quality de novo transcriptome assemblies of two Agave species, A. tequilana and A. deserti, from short-read RNA-seq data. Our analyses support completeness and accuracy of the de novo transcriptome assemblies, with each species having approximately 35,000 protein-coding genes. Comparison of agave proteomes to those of additional plant species identifies biological functions of gene families displaying sequence divergence in agave species. Additionally, we use RNA-seq data to gain insights into biological functions along the A. deserti juvenile leaf proximal-distal axis. Our work presents a foundation for further investigation of agave biology and their improvement for bioenergy development.

ABSTRACT

Cinvestav

BLAST

A. deserti A. tequilana

29482835 13,388

OrthoMCLone-to-one RBH protein comparison protein family comparison

20,37720,161 14,709

A. deserti A. tequilana

OrthoMCL

B. distachyon16618

O. sativa19643

S. bicolor182181821818218821882118218

Z. mays20681

A. deserti16223

A. tequilana16336

A

C

B

461

315782

755129

503199

2086

13

5512

7

1326

233

5

21

1531

12593

8929

46 16

111

787

37511108

2735

161362

575

3718

33

1789

14

7020

26

16

936

114

3

27

1329

20748

4323

35

81

23

15135

11

46176

108 8144

Relative composite RPKM valuenormalized to section with highest expression

0.0 0.2 0.4 0.6 0.8 1.0

transcription factorshorm

ones

1 2 3 4 1 2 3 4

cell wall &

stomata

development

Cell wall biosynthesis

Cellulose biosynthesis

Lignin biosynthesis

Stomata development

Cutin & suberin biosynthesis

photosynthesis

Antenna proteins

Photosystem II

Photosystem I

Cytochrome b6f & ATP synthase

Calvin cycle

C4 dicarboxylic acid cycles

dark

light CA

M

Chlorophyll biosynthesis Class I

Class II

MADS-box

GRAS

YABBY

MYB

bHLH

Zn finger

KN

OX

Au

xin biosynthesis

transport

CK biosynthesis

GA

biosynthesis

ET

H

biosynthesis

BR biosynthesis

AB

A

biosynthesis

7964

5 of

887

18 lo

ci c

lust

ered

photosynthesisregulation of

gene expressiontranslationcellular protein

modificationDNA

metabolismvesicle-mediated

transport

Cluster E

1 2 3 4

17579 loci

Cluster F

1 2 3 4

8426 loci

Distal Expression

0

1

z

Cluster A

1 2 3 4

22249 loci

Cluster B

1 2 3 4

12063 loci

Cluster C

1 2 3 4

11789 loci

Cluster D

1 2 3 4

7539 loci

3 421proximal(base)

distal(tip)

D

A

B

C

Number of reads

Pro

babi

lity

of o

bser

ving

a u

niqu

e 25

-mer

contig GC content

dens

ity

0

1

2

3

4

5

6

0.0 0.2 0.4 0.6 0.8 1.0

A. tequilana contigs

removed contigs

contig GC content

0

2

4

6

0.0 0.2 0.4 0.6 0.8 1.0

A. deserti contigs

removed contigs

0.0

0.2

0.4

0.6

0.8

1.0

1 100 10000

non-coding

coding

A. deserti

Locus RPKMTranscript length (nt)

0.0

0.3

0.6

0.9

1.2

100 1000 10000

dens

ity

A. deserti

A. tequilana

Locus RPKM

0.0

0.5

1.0

1.5

2.0

1 100 10000

A. tequilana

non-coding

coding

dens

ity

0.2

0.4

0.6

0.8

210 215 220 225 230

A. desertiA. tequilana

C D

A B

E F

Agave tequilana Agave deserti

0

1000

2000

0

10

20

30

0

2000

4000

6000

PacBio subreads || Rnnotator

Rnnotator contig contains

PacBio subread

Pacbio subreadcontains

Rnnotator contig

Sequences

overlap

unaligned

Num

ber o

f tra

nscr

ipts

n = 47672683

862

1221

1

GenBank || Rnnotator

Rnnotator contig contains

GenBank sequence

GenBank sequence contains

Rnnotator contig

Sequences

overlap

unaligned

n = 82

1

38

14

29

A. deserti || A. tequilanaA. tequilana || A. deserti

A. deserti contains

A. tequilana

A. tequilana contains

A. deserti

Sequences

overlap

unaligned

0

10,000

20,000

30,000

40,000

50,000

0

30,000

60,000

90,000

A. deserti contains

A. tequilana

A. tequilana contains

A. deserti

Sequences

overlap

unalignedFraction of A. tequilana transcript

aligning to A. deserti transcript

0

20

30

0.0 0.3 0.6 0.9

Num

ber

of A

. teq

uila

na tr

asnc

ripts

(th

ousa

nds)

a

Fraction of A. deserti transcriptaligning to A. tequilana transcript

0

10

20

30

40

50

Num

ber

ofA

. des

erti

tran

scrip

ts (

thou

sand

s)

0.0 0.6 0.90.3

A. deserti || A. tequilanaA. tequilana || A. deserti

Num

ber o

f tra

nscr

ipts

(tho

usan

ds)

28,627

44,443

107,821

23,649

n = 128,959

32,231

37,241

52,109

7,378

n = 204,530

10

McKain et al. || Rnnotator

n = 12,972

Overlap:

Containment:

Class descriptions

indels

no indels

unaligned

Rnnotator contig contains

McKain et al. sequence

McKain et al. contig contains

Rnnotator contig

Sequences

overlap

unaligned

6578

2710

3560

124

A

B C

AGAVE TRANSCRIPTOME ASSEMBLIES FROM DEEP RNA-seq

COMPARISON OF AGAVE DE NOVO ASSEMBLIES

FIGURE 3: Comparison of the de novo Agave transcriptome assemblies

(A) Comparisons of the A. tequilana de novo Rnnotator assembly to error corrected Pacific Biosciences subreads, 82 GenBank A. tequilana sequences, and an additional A. tequilana dataset from McKain et al. 2012. [4]

(B) Comparisons between the A. tequilana and A. deserti de novo Rnnotator assemblies.

(C) Histograms of the fraction of aligned sequence lengths between A. deserti and A. tequilana.

Symbol || separates query sequence dataset from subject sequence dataset. Total number of sequences (n) is noted in each bar chart, total number of sequences in alignment classes are noted above bar.

FIGURE 4: Proteomic comparison of agaves to other plant species

(A) Venn diagram of BLASTP-based one-to-one reciprocal best hit proteins shared between A. deserti and A. tequilana.

(B) Venn diagram of OrthoMCL-defined protein families shared between agaves.

(C) Edwards-Venn diagram of OrthoMCL-defined plant orthologous-group protein families (Plant OGs) shared between agave and 4 additional monocotyledonous plant species. Shape and color used for each species is at the right with the total number of Plant OGs within each species.

FIGURE 5: Transcriptomic analysis of the A. deserti leaf proximal-distal axis.

(A) One of the A. deserti leaves used for analysis, indicating proximal-distal (PD) sections 1–4.

(B) Six major K-means clusters of gene expression along the PD axis. Clusters are manually grouped by highest expression in proximal, medial, or distal tissues. Blue lines connect mean z-scaled RPKM values, shaded areas represent the 25th and 75th percentiles, red lines indicate standard error at each mean. Green text beneath each cluster denotes the description of the most significantly enriched GO term in each cluster.

(C, D) Heatmaps of composite gene expression for indicated biological processes along the leaf PD axis.

PROTEOMIC ANALYSES SUPPORT COMPREHENSIVE AGAVE TRANSCRIPTOME ASSEMBLIES

PROFILING OF THE A. DESERTI LEAF HIGHLIGHTS REGIONS CRITICAL TO DEVELOPMENT AND PHOTOSYNTHESIS

FIGURE 2: A. tequilana, A. deserti, and their respective transcriptomes

(A) Cultivated A. tequilana in Jalisco, Mexico.

(B) A. deserti (foreground) in natural habitat, Riverside County, California, USA.

(C) Plot of the fraction of unique 25-mers over indicated read depth (log2 scale).

(D) Density plot of GC content of agave transcript contigs vs. contigs from contamination and commensal organisms.

(E) Density plots of A. deserti and A. tequilana transcript lengths. Note log10 scale. Peaks at 150 and 250 nt represent single reads or paired-end reads, respectively, that were not assembled into larger contigs.

(F) Density plot of locus RPKM values for coding (dark shading) and non-coding (light shading) loci.

Species

A. tequilana

A. deserti

TotalSequencing293.5 Gbp

184.7 Gbp

No. of loci

139,525

88,718

No. transcriptcontigs204,530

128,869

N50 length

1387 bp

1323 bp

Sumassembled length

204.9 Mbp

125.0 Mbp

No. protein-codingloci

34,870

35,086

OVERVIEW OF AGAVE TRANSCRIPTOME ASSEMBLIES

CAM PHOTOSYNTHESIS, ARID ENVIRONMENTS, AND BIOENERGYAgave species are adapted to their native habitat in arid regions of Mexico and the United States. Agave thus holds promise as a biofuel feedstock [1,2], capable of growing on marginal lands where other proposed bioenergy plants cannot. The ability of agaves to withstand hot and arid conditions relies upon crassulacean acid metabolism (CAM)—a specialized form of photosynthesis allowing agaves to keep leaf stomata (pores) closed during the hot day, minimizing water loss through evapotranspiration.

A

sugar

CO2

C4

C3

CO2

C3

C4

CalvinCyclelight

vacuolechloroplast

NIGHT

DAY

B

C

Agave

Semi-arid regions

FIGURE 1: Agaves and CAM biology

(A) Agave tequilana cultivated in Mexico.

(B) Semi-arid regions of the United States (brown) are unsuitable for cultivation of other bioenergy plants, which require more temperate regions (green). Most Agave species are adapted to semi-arid regions in Mexico and the extreme southwestern USA (purple).

(C) Crassulacean Acid Metabolism (CAM). CO2enters plant cells at night, joins with a 3-carbon molecule (C3) and is stored in the vacuole as a 4-carbon molecule (C4). During the day, C4molecules diffuse out of the vacuole, and CO2 is relased and assimilated into sugar in the chloroplast.

Comparison of inputs (water and nitrogen) and outputs (biomass and ethanol) of agaves and other biofuel feedstock species. Though agaves are harvested at several years of age, their annualized growth rate is on par with Miscanthus. Table is modified from reference [2].

Corn grainCorn stoverMiscanthusPoplar coppiceAgave spp.

75–12070–10530–80

50–80 low

lowmoderate

high

0–150–500–12

90–120 7–103–6

15–405–11

10–34

2900900

4600–12,4001500–3400

3000–10,500

Feedstock Water(cm yr-1)

Droughttolerance

Nitrogen(kg ha-1 yr-1)

Dry biomass(Mg ha-1 yr-1)

Ethanol(liters yr-1)

Inputs Outputs

This work performed at the U.S. Department of Energy Joint Genome Institute was supported in part by the Office of Science of the U.S. Department of Energy under contract DE-AC02-05CH112.

[1] Davis, A. S. et al. The global potential for Agave as a biofuel feedstock. GCB Bioenergy 3, 68–78, (2011).[2] Somerville, C. et al. Feedstocks for lignocellulosic biofuels. Science 329, 790-2, (2010).[3] Martin, J. et al. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-seq reads. BMC

Genomics 11, 663, (2010).[4] McKain, M. et al. Phylogenomic analysis of transcriptome data elucidates co-occurrence of a paleopolyploid event and the origin

of bimodal karyotates in Agavoideae (Asparagaceae). Am J Bot 99:2, 397–406.

To provide sequence resources for the Agave research community, we built de novo transcriptomes of Agave tequilana and Agave deserti from deep Illumina RNA-seq data. Sequences were assembled by Rnnotator [3], a de novo transcriptome assembly pipeline.

ACKNOWLEDGEMENTS AND CITATIONS

Analysis of assembled contigs suggest the Agave de novo assemblies are comprehensive and accurate.

Proteome comparisons between Agave species and additional monocot species suggest the majority of Agave proteins are conserved across taxa. We can also identify protein families specific to agaves.

Agaves spend the majority of their lives as compact rosettes, thus leaves are important organs in which to study Agave developmental and bioenergetic processes.