+ All Categories
Home > Documents > Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 ....

Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 ....

Date post: 02-Mar-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
17
Transcriptome Analysis of Extant Cotton Progenitors and Identification of Genome-Specific-Single Nucleotide Polymorphism (GNP) Gyoungju Nah (Dr. Z. Jeffrey Chen Lab) University of Texas at Austin
Transcript
Page 1: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

Transcriptome Analysis of Extant Cotton Progenitors and Identification of Genome-Specific-Single Nucleotide

Polymorphism (GNP) Gyoungju Nah (Dr. Z. Jeffrey Chen Lab)

University of Texas at Austin

Page 2: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

A and D genomes as Extant Parents of AADD allotetraploid

1. What is the difference in transcriptome between G. arboreum (AA) and G. raimondii (DD)? Needs AA and DD EST information

(Hovav et al. 2008)

~7MYA

~1MYA

01

Page 3: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

2. How does A- and D- allelic expression contribute to fiber development in allopolyploids? Needs GNP information

TC67135 (cyclin D)

(Yang et al. 2006)

A –allele specific expression in fiber-bearing ovule

A D AD A A A A A A A

Allelic Expression During Cotton Fiber Development 02

G. raimondii (D5)

G. hirsutum (AADD)1

G. Barbadense (AADD)2

G. arboreum (A2)

African-Asian A-genome

New World D-genome

New World AD-genome

Allotetraploids

1cm

1cm

1cm

1cm

Page 4: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

Work Flow

454/Roche Titanium sequencing: 1,699,776 reads from G. arboreum (AA) 1,464,815 reads from G. raimondii (DD)

Young leaves, roots, bolls, ovules, and fibers

Assembly of 454 reads (Chen lab) AA: 62,609 contigs (avr. 1,032 bp) DD: 34,908 contigs (avr. 1,107 bp)

Assembly of 454 reads (Udall lab) AA: 89,185 contigs (avr. 629 bp) DD: 68,984 contigs (avr. 676 bp)

After merge of Chen and Udall ESTs: A: 89,588 contigs (avr. 806 bp) D: 65,542 contigs (avr. 840 bp)

Merge from two labs increased both number and size of AA and DD ESTs

03

Page 5: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

Contig Size Distribution of AA and DD EST libraries

Transcriptome size of A-subgenome is ~27% larger than that of D-subgenome The majority of A (88%) and D (84%) contigs range from 200-1,500bp

Num

ber o

f con

tigs

Contig size (bp)

A

D

10,000

30,000

50,000

70,000

90,000 89,588

65,542

04

Page 6: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

AA and DD EST Coverage 05

Unmatched

Matched 100%

0%

50%

: Query : Subject

A CGI11

D CGI11

CGI11 A

CGI11 D

61.7% 70.1%

81.2% 82.4% 81.2% 82.4%

61.7% 70.1%

18.8% 17.6% 38.3% 29.9%

BlastN: e-10 CGI11 with 117,992 contigs from mixture of AA, DD, and AADD ESTs

New A and D ESTs include ~80% of entries in CGI11 New A and D ESTs provide additional ~30-38% ESTs that are not present in CGI11

Page 7: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

Redundancy in AA and DD ESTs 06

A and D transcriptomes are highly redundant, indicating the presence of ~50% of isoforms and paralogs in the cotton genome

A before BlastN

D before BlastN

A after BlastN

D after BlastN

10,000

30,000

50,000

70,000

90,000

48.3% 42.7%

100%

100%

Num

ber o

f con

tigs

A D BlastN: e-100

Page 8: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

Estimation of Diversification of AA and DD ESTs 07

Diversified

Conserved 100%

50%

0%

73.2% 80.8%

: Query : Subject

A D

D A

73.2% 80.8%

26.8% 19.2%

Reciprocal BlastN: e-10

Either one of the libraries does not cover the entire transcriptome This diversification was estimated as 27% in A and 19% in D

Page 9: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

AA and DD ESTs Against Known Protein Databases

35,335 (39.4%)

846 (0.9%)

10,493 (11.7%)

42,914 (47.9%)

A

33,333 (50.9%) 24,581

(37.5%)

6,300 (9.6%)

1,328 (2%)

D

28.3 %

27.4 %

10.6

6.8 4.1

4.3 4.5

4.2

3.4 2.5 2.3 1 0.4 %

28.2 %

27.4 %

10.6

6.6 4.2

4.4 4.7

4.4

3.6 2.4 2.3 0.9 0.4 % (B)

(A)

Matched TAIR10 peptide (E-10) Matched Uniprot (E-10) Matched pfamA (E-05) Unmatched

08

D contains a higher portion of Ath protein homologs than A Both A and D ESTs are enriched with cellular process and metabolic process

Page 10: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

miRNA Targets in AA and DD ESTs

127 115 106

(B) A (242/89,588 contigs) 0.27%

D (233/65,542 contigs) 0.36%

Freq

uenc

y of

miR

NA

(A) A

D

09

miRNA regulation in DD genome is higher than AA, suggesting that in allotetraploid, miRNA might play important role for D-allele regulation

Page 11: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

Selection of High Quality GNPs

Position A Position B

High quality Low quality

GNP Selection (Criteria: >= 8X coverage, >=90% consensus, Q>=25)

Number of GNP-containing contigs is 11,000

10

Page 12: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

Allele-Separable Genes-I 11

Cotton EST ID TAIR ID Gene NameUDcontig30230 AT1G48410.1 AGO1cuDContig2017 AT1G48410.1 AGO1cuDContig12969 AT1G48410.1 AGO1cuDContig19558 AT1G31280.1 AGO2cuDContig4863 AT2G27040.2 OCP11cuDContig7218 AT2G27040.2 OCP11cuDContig17255 AT2G27880.1 AGO5UDcontig10468 AT1G01040.2 SUS1cuDContig12083 AT3G03300.3 DCL2UDcontig8529 AT3G03300.3 DCL2cuDContig3778 AT1G14790.1 RDR1cuDContig17499 AT5G14620.1 DRM2cuDContig11889 AT4G19020.1 CMT2cuDContig7567 AT1G69770.1 CMT3cuDContig13379 AT1G77300.1 SDG8cuDContig637 AT1G73100.1 SUVH3UDcontig12385 AT2G22740.1 SUVH6cuDContig13480 AT3G12680.1 HUA1cuDContig19226 AT1G05460.1 SDE3cuDContig7543 AT1G01920.2 SET-domaincuDContig6339 AT1G05120.1 SNF2-domain

Epigenetic-associated genes Cotton EST ID TAIR ID Gene NameUDcontig31044 AT2G46830.1 CCA1CDcontig25809 AT1G01060.4 LHY1UDcontig11814 AT1G01060.4 LHY1UDcontig14538 AT1G01060.5 LHY1

Clock-related genes

Page 13: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

Allele-Separable Genes-II

Cotton EST ID TAIR ID Gene NamecuDContig15019 AT1G22640.1 MYB3cuDContig7688 AT1G68670.1 MYB-domaincuDContig4930 AT1G74840.1 MYB-domaincuDContig5155 AT2G01060.1 MYB-domainUDcontig50103 AT2G03500.1 MYB-domaincuDContig13445 AT2G23290.1 AtMYB70cuDContig3509 AT2G38090.1 MYB-domaincuDContig517 AT2G38090.1 MYB-domaincuDContig13450 AT2G38090.1 MYB-domaincuDContig4163 AT2G47190.1 MYB2cuDContig13595 AT3G09600.1 MYB-domaincuDContig1160 AT3G10760.1 MYB-domaincuDContig3516 AT3G13040.2 MYB-domaincuDContig15142 AT3G18100.1 MYB4R1cuDContig5233 AT4G09460.1 AtMYB6UDcontig45109 AT4G32730.2 PC-MYB1cuDContig16746 AT4G32730.2 PC-MYB1cuDContig3991 AT4G37260.1 MYB73cuDContig770 AT4G38620.1 MYB4cuDContig16699 AT5G04760.1 MYB-domaincuDContig16484 AT5G15310.2 ATMYB16UDcontig12659 AT5G45420.1 MYB-domaincuDContig19287 AT5G52660.1 MYB-domaincuDContig5755 AT5G52660.2 MYB-domaincuDContig16730 AT5G67300.1 MYBR1

Myb-related genes Cotton EST ID TAIR ID Gene NamecuDContig1242 AT1G05010.1 EFEcuDContig2761 AT1G05010.1 EFEcuDContig4895 AT1G07890.8 MEE6cuDContig7679 AT1G12910.1 ATAN11UDcontig42026 AT1G62660.1 BFRUCT3cuDContig4618 AT2G01570.1 RGA1cuDContig14416 AT2G01570.1 RGA1cuDContig1899 AT2G28950.1 ATHEXP cuDContig3387 AT2G40610.1 EXP8cuDContig7169 AT3G43190.1 SUS4cuDContig12426 AT4G03010.1 Leucine-richcuDContig5067 AT4G22880.2 TT18cuDContig3717 AT5G13710.2 SMT1cuDContig389 AT5G24520.2 TTG1cuDContig5774 AT5G25610.1 RD22

Fiber-related genes

12

Page 14: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

GNP Identification and Characterization 13

(A)

34,059

3,277

34,059

926

4,822 5,000

15,000

25,000

35,000

SNP Indel

34,059

926

4,822

Num

ber o

f con

tigs

bp/SNP

(B)

200

600

1,000

1,400

34,985

4,822

GNP=SNP+Indel

Freq

uenc

y

Page 15: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

G. a

G. r

G. h

M 16 17 20 21 22 27 29 34 M M 39 40 41 44 45 46 47 48 M (A)

Exp#48 CDcontig15250 cuAContig1068

Exp#22 cuDContig11665 cuAContig4409

G. arboreum G. romandii G. hirsutum G. arboreum G. romandii G. hirsutum

(B)

GNP Experimental Validation 14

By X. Guan

A

D

AD

200bp

400bp

Page 16: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

Conclusions

We generated AA and DD EST libraries from extant progenitors of allotetraploid AADD cotton, which provides an important genomic resource for cotton fiber research and crop improvement

Comparative analysis of AA and DD ESTs provided some new

insights into transcriptome divergence between G. arboreum (AA) and G. raimondii (DD) genomes

Analysis of miRNA targets in AA and DD ESTs suggests that

miRNA-mediated gene regulation plays a role in expression of target genes from A and D subgenomes in allopolyploids

We developed a pipeline of GNPs that can discriminate between a

large number of AA and DD ESTs (~11,000), including many involved in the fiber development and epigenetic pathways.

15

Page 17: Transcriptome Analysis of Extant Cotton Progenitors and Identification … · 2017. 4. 21. · 03 . Contig Size Distribution of AA and DD EST libraries ... cuDContig17255 AT2G27880.1

Acknowledgement

University of Texas at Austin Dr. Jeffrey Z. Chen Dr. Yuki Guan Brigham Young University Dr. Joshua Udall Texas A&M University Dr. David Stelly UT GSAF Dr. Scott Hunicke-Smith Texas Advanced Computing Center


Recommended