+ All Categories
Home > Documents > Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]....

Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]....

Date post: 05-Sep-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
17
Peng et al. BMC Ecol Evo (2021) 21:71 https://doi.org/10.1186/s12862-021-01800-1 RESEARCH ARTICLE Exploring the evolutionary characteristics between cultivated tea and its wild relatives using complete chloroplast genomes Jiao Peng 1,2 , Yunlin Zhao 1,2 , Meng Dong 2 , Shiquan Liu 2 , Zhiyuan Hu 2 , Xiaofen Zhong 2 and Zhenggang Xu 1,2,3* Abstract Background: Cultivated tea is one of the most important economic and ecological trees distributed worldwide. Cul- tivated tea suffer from long-term targeted selection of traits and overexploitation of habitats by human beings, which may have changed its genetic structure. The chloroplast is an organelle with a conserved cyclic genomic structure, and it can help us better understand the evolutionary relationship of Camellia plants. Results: We conducted comparative and evolutionary analyses on cultivated tea and wild tea, and we detected the evolutionary characteristics of cultivated tea. The chloroplast genome sizes of cultivated tea were slightly different, ranging from 157,025 to 157,100 bp. In addition, the cultivated species were more conserved than the wild species, in terms of the genome length, gene number, gene arrangement and GC content. However, comparing Camellia sinensis var. sinensis and Camellia sinensis var. assamica with their cultivars, the IR length variation was approximately 20 bp and 30 bp, respectively. The nucleotide diversity of 14 sequences in cultivated tea was higher than that in wild tea. Detailed analysis on the genomic variation and evolution of Camellia sinensis var. sinensis cultivars revealed 67 single nucleotide polymorphisms (SNPs), 46 insertions/deletions (indels), and 16 protein coding genes with nucleo- tide substitutions, while Camellia sinensis var. assamica cultivars revealed 4 indels. In cultivated tea, the most variable gene was ycf1. The largest number of nucleotide substitutions, five amino acids exhibited site-specific selection, and a 9 bp sequence insertion were found in the Camellia sinensis var. sinensis cultivars. In addition, phylogenetic relation- ship in the ycf1 tree suggested that the ycf1 gene has diverged in cultivated tea. Because C. sinensis var. sinensis and its cultivated species were not tightly clustered. Conclusions: The cultivated species were more conserved than the wild species in terms of architecture and linear sequence order. The variation of the chloroplast genome in cultivated tea was mainly manifested in the nucleotide polymorphisms and sequence insertions. These results provided evidence regarding the influence of human activities on tea. Keywords: Chloroplast genome, Cultivated tea, Evolution, ycf1, Camellia © The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativeco mmons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Background From ancient times, numerous plant species have been taken from their habitats and introduced into cultiva- tion—that is, into various human-made systems [1]. e cultivation process has played an important role in human history and cultivated environments often pre- sent strong ecological contrasts with wild environments [2]. Wild species are exposed to natural selection that Open Access BMC Ecology and Evolution *Correspondence: [email protected] 1 Hunan Research Center of Engineering Technology for Utilization of Environmental and Resources Plant, Central South University of Forestry and Technology, Changsha 410004, Hunan, People’s Republic of China Full list of author information is available at the end of the article
Transcript
Page 1: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Peng et al. BMC Ecol Evo (2021) 21:71 https://doi.org/10.1186/s12862-021-01800-1

RESEARCH ARTICLE

Exploring the evolutionary characteristics between cultivated tea and its wild relatives using complete chloroplast genomesJiao Peng1,2, Yunlin Zhao1,2, Meng Dong2, Shiquan Liu2, Zhiyuan Hu2, Xiaofen Zhong2 and Zhenggang Xu1,2,3*

Abstract

Background: Cultivated tea is one of the most important economic and ecological trees distributed worldwide. Cul-tivated tea suffer from long-term targeted selection of traits and overexploitation of habitats by human beings, which may have changed its genetic structure. The chloroplast is an organelle with a conserved cyclic genomic structure, and it can help us better understand the evolutionary relationship of Camellia plants.

Results: We conducted comparative and evolutionary analyses on cultivated tea and wild tea, and we detected the evolutionary characteristics of cultivated tea. The chloroplast genome sizes of cultivated tea were slightly different, ranging from 157,025 to 157,100 bp. In addition, the cultivated species were more conserved than the wild species, in terms of the genome length, gene number, gene arrangement and GC content. However, comparing Camellia sinensis var. sinensis and Camellia sinensis var. assamica with their cultivars, the IR length variation was approximately 20 bp and 30 bp, respectively. The nucleotide diversity of 14 sequences in cultivated tea was higher than that in wild tea. Detailed analysis on the genomic variation and evolution of Camellia sinensis var. sinensis cultivars revealed 67 single nucleotide polymorphisms (SNPs), 46 insertions/deletions (indels), and 16 protein coding genes with nucleo-tide substitutions, while Camellia sinensis var. assamica cultivars revealed 4 indels. In cultivated tea, the most variable gene was ycf1. The largest number of nucleotide substitutions, five amino acids exhibited site-specific selection, and a 9 bp sequence insertion were found in the Camellia sinensis var. sinensis cultivars. In addition, phylogenetic relation-ship in the ycf1 tree suggested that the ycf1 gene has diverged in cultivated tea. Because C. sinensis var. sinensis and its cultivated species were not tightly clustered.

Conclusions: The cultivated species were more conserved than the wild species in terms of architecture and linear sequence order. The variation of the chloroplast genome in cultivated tea was mainly manifested in the nucleotide polymorphisms and sequence insertions. These results provided evidence regarding the influence of human activities on tea.

Keywords: Chloroplast genome, Cultivated tea, Evolution, ycf1, Camellia

© The Author(s) 2021. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

BackgroundFrom ancient times, numerous plant species have been taken from their habitats and introduced into cultiva-tion—that is, into various human-made systems [1]. The cultivation process has played an important role in human history and cultivated environments often pre-sent strong ecological contrasts with wild environments [2]. Wild species are exposed to natural selection that

Open Access

BMC Ecology and Evolution

*Correspondence: [email protected] Hunan Research Center of Engineering Technology for Utilization of Environmental and Resources Plant, Central South University of Forestry and Technology, Changsha 410004, Hunan, People’s Republic of ChinaFull list of author information is available at the end of the article

Page 2: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 2 of 17Peng et al. BMC Ecol Evo (2021) 21:71

operates to promote survival under abiotic and biotic stresses, while cultivated species are subjected to artifi-cial selection that emphasizes a steady supply, improved quality and increased yield. The criteria for fitness are expected to change dramatically under both regimes. Therefore, alterations in vegetation phenology, growth and reproductive traits occur because the plants are subjected to different levels of stress and distinctive selection pressures [3]. Pot experiments showed there were significant differences in the flowering and pod set between wild and cultivated types of soybean [4]. In addi-tion, the compounds and microstructures have been sur-veyed for many horticultural plants [5]. The inadequate genetic information prevents us from fully understanding the spreading process of cultivated plants. We need to compare the genetic differences between cultivated spe-cies and wild species in order to use these species more effectively.

Camellia, containing approximately 280 species, is a genus with high economic, ecological and phylogenetic values in the family Theaceae [6, 7]. Camellia are native to Asia and have been cultivated for more than 1300 years [8]. Because their variety of uses, the cultivated species are now found all over the world [9, 10]. Camellia species can provide many valuable products, including making tea with the young leaves and extracting edible oil from the seeds. Moreover, most Camellia species are also of great ornamental value [11]. The genus Camellia is com-posed of more than 110 taxa [12], of which Camellia sin-ensis (L.) O. Kuntze is the most important source of the beverage tea. Cultivated tea plant varieties mainly belong to two major groups: Camellia sinensis var. sinensis (CSS; Chinese type) and Camellia sinensis var. assamica (CSA; Assam type) [13]. Due to long-term cultivation and man-ual selection, C. sinensis formed many local varieties, such as Camellia sinensis var. sinensis cv. Anhua (CSSA), Camellia sinensis var. sinensis cv. Longjing43 (CSSL), Camellia sinensis var. assamica cv. Yunkang10 (CSAY) and so on. Wild tea plants are important genetic diversity resources that can provide new traits for improved yield, disease resistance and tolerance to different environ-mental conditions. For example, the leaves of CSSA, well known for its specific area, are the main sources of dark tea [14]. The quality of dark tea products is related to the abundant cultivars, germplasm resources and geographi-cal conditions [15].

The chloroplast (cp) genome is often used to ana-lyze the evolutionary process and the phylogenetic status because of its high degree of conservation and relatively compact gene alignment. Moreover, cp genome sequences are useful in the identification of closely related, breeding-compatible plant species [16]. Although the cp genome is very useful, there are still a limited

number of full cp genomes available from Camellia spe-cies so far [7, 14, 17–21].

It has been proven that human interference has effects on the genetic structure, leaf nutrients and pollen mor-phology of Camellia [22–24]. For example, due to human overexploitation of habitats and long-term tar-geted selection of traits, the genetic diversity of Camel-lia germplasm resources has been significantly reduced [25]. Thus, it remains unclear what impact the artificially selected cultivated Camellia has had on the evolutionary mechanism of the cp genome.

Current research often ignores material differences between cultivated and wild species. After sequencing the complete chloroplast genome of CSSA (MH042531), we wanted to explore evolutionary characteristics between cultivated tea and its wild relatives [14]. To assess the var-iations in the chloroplast genome in wild and cultivated species of Camellia, and to detect the evolutionary char-acteristics of cultivated tea, we selected earlier published Camellia chloroplast genomes and conducted compara-tive and evolutionary analysis. This can help us to bet-ter understand the structure of the Camellia chloroplast genomes and the phylogenetic relationships among spe-cies, and provide more information about the influence of human activities on tea. We believe that this research will encourage more researchers to pay attention to tea resources.

ResultsChloroplast genome features of cultivated teaThe lengths of the whole genomes of cultivated tea (CSSA, CSSL and CSAY) were slightly different, ranging from 157,025 to 157,100  bp. However, compared with CSSA and CSSL, the genome of CSAY was different. Both CSSA and CSSL contained 81 unique CDS genes, 30 tRNA, 4 rRNA and 3 pseudogenes (ψycf1, ψycf2 and ψycf15). Among them, atpF, ndhA, ndhB, petB, petD, rpl2, rpl16, rpoC1, rps16, trnG-GCC , trnI-GAU , trnL-UAA , and trnV-UAC contained a single intron, while clpP and ycf3 contained two introns. However, in CSAY, orf42 and ycf15 were lost, and rps12 and trnA-UGC had an inserted intron sequence (Fig. 1).

Comparison of chloroplast genomes between cultivated tea and wild teaIn our study, first, we compared CSS with its two cul-tivated species (CSSA and CSSL). These species were defined as the Chinese cultivated type. Then, we com-pared CSA with its one cultivated species (CSAY). These species were defined as the Assam cultivated type. Finally, we compared CSS, CSA and 12 wild but related species: Camellia azalea (CAZ), Camellia

Page 3: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 3 of 17Peng et al. BMC Ecol Evo (2021) 21:71

crapnelliana (CCR), Camellia cuspidate (CCU), Camellia grandibracteata (CGR), Camellia impressin-ervis (CIM), Camellia petelotii (CPE), Camellia pitardii (CPI), Camellia pubicosta (CPU), Camellia reticulata (CRE), Camellia sinensis var. pubilimba (CSP), Camel-lia taliensis (CTA) and Camellia yunnanensis (CYU).

These species were defined as the wild type (Tables  1 and 2).

Chloroplast genomic similarityIn the Chinese cultivated type, the average length across the cultivated species was 62  bp smaller than CSS. In the Assam cultivated type, the genome length of CSAY

Camellia sinensis var. sinensis cv. Anhua157,025 bp

Camellia sinensis var. sinensis cv. Longjing43157,085 bp

Camellia sinensis var. assamica cv. Yunkang10157,100 bp

Fig. 1 Gene map of the complete chloroplast genome of cultivated tea. The inner circle corresponds to the GC content, and the next circle corresponds to the GC skew. The next three circles correspond to the genes. Genes with clockwise arrows represent reverse strands, while genes with counterclockwise arrows represent forward strands. Blue, red and aqua colors of the blocks represent protein-coding genes, introns and RNA, respectively. The third circle corresponds to the shared genes among three cultivated tea. The fourth circle corresponds to the unique genes of Camellia sinensis var. sinensis Anhua and Camellia sinensis var. sinensis Longjing43. The fifth circle corresponds to the unique genes of Camellia sinensis var. assamica cv. Yunkang10

Page 4: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 4 of 17Peng et al. BMC Ecol Evo (2021) 21:71

Tabl

e 1

Chl

orop

last

gen

omic

feat

ures

of s

even

teen

Cam

ellia

spe

cies

CSS:

Cam

ellia

sine

nsis

var

. sin

ensi

s, CS

SA: C

amel

lia si

nens

is v

ar. s

inen

sis

cv. A

nhua

, CSS

L: C

amel

lia si

nens

is v

ar. s

inen

sis

cv. L

ongj

ing4

3, C

SA: C

amel

lia si

nens

is v

ar. a

ssam

ica,

CSA

Y: C

amel

lia si

nens

is v

ar. a

ssam

ica

cv.

Yunk

ang1

0, C

SP: C

amel

lia si

nens

is v

ar. p

ubili

mba

, CSP

; Cam

ellia

gra

ndib

ract

eata

, CTA

: Cam

ellia

talie

nsis

, CIM

: Cam

ellia

impr

essi

nerv

is, C

PU: C

amel

lia p

ubic

osta

, CA

Z: C

amel

lia a

zale

a, C

PI: C

amel

lia p

itard

ii, C

RE: C

amel

lia

retic

ulat

a, C

CR: C

amel

lia c

rapn

ellia

na, C

CU: C

amel

lia c

uspi

date

, CPE

: Cam

ellia

pet

elot

ii, C

YU: C

amel

lia y

unna

nens

is

Spec

ies

CSS

CSSA

CSSL

CSA

CSAY

CSP

CGR

CTA

CI

MCP

UCA

ZCP

ICR

ECC

R CC

U

CPE

CYU

Gen

ome(

bp)

157,

117

157,

025

157,

085

157,

028

157,

100

157,

086

157,

127

156,

974

156,

892

157,

076

157,

039

156,

585

156,

971

156,

997

156,

618

157,

121

156,

592

CD

S (b

p)80

,542

80,6

2080

,650

79,0

9379

,092

80,6

2280

,656

79,5

7779

,655

80,6

6580

,629

79,6

1976

,224

79,6

4979

,643

80,6

5079

,655

Intr

ons

(bp)

15,1

9215

,196

15,1

9817

,902

17,9

0215

,210

15,2

0516

,947

16,8

9715

,198

15,1

9516

,937

15,1

8216

,239

16,9

1715

,196

16,9

35

IGS

(bp)

49,5

3549

,361

49,3

8948

,200

48,2

6849

,405

49,4

1848

,591

48,4

8149

,365

49,3

6748

,171

53,7

1749

,321

48,1

9949

,427

48,1

43

tRN

A (b

p)28

0228

0228

0227

8927

9028

0228

0228

1328

1328

0228

0228

1228

0227

4228

1328

0228

13

rRN

A (b

p)90

4690

4690

4690

4490

4890

4790

4690

4690

4690

4690

4690

4690

4690

4690

4690

4690

46

Gen

es11

511

511

511

311

311

511

511

511

511

511

511

511

511

411

511

511

5

CD

S ge

nes

8181

8179

7981

8181

8181

8181

8181

8181

81

tRN

A g

enes

3030

3030

3030

3030

3030

3030

3029

3030

30

Intr

ons

1818

1822

2218

1821

2118

1821

1820

2118

21

Gen

ome

GC

37.3

37.3

37.2

937

.337

.29

37.3

237

.29

37.3

237

.33

37.3

37.3

37.3

437

.31

37.3

37.3

137

.29

37.3

3

CD

S G

C37

.58

37.5

737

.56

37.4

737

.47

37.5

837

.56

37.5

737

.54

37.5

737

.56

37.5

637

.54

37.5

437

.56

37.5

637

.54

Intr

ons

GC

36.4

136

.38

36.3

837

.91

37.9

136

.42

36.3

937

.25

37.2

836

.42

36.4

137

.22

36.4

37.5

437

.25

36.4

137

.25

IGS

GC

32.9

332

.94

32.9

432

.48

32.4

632

.97

32.9

432

.68

32.7

232

.93

32.9

532

.71

33.3

932

.64

32.6

332

.92

32.6

8

tRN

A G

C52

.86

52.8

652

.86

52.9

952

.97

52.8

652

.86

52.8

652

.952

.89

52.8

652

.92

52.8

652

.88

52.9

52.8

652

.9

rRN

A G

C55

.39

55.4

155

.41

55.4

055

.39

55.4

155

.41

55.3

855

.41

55.4

255

.39

55.3

655

.34

55.4

155

.38

55.3

955

.41

Gen

e lo

sses

orf4

2,yc

f1,

ycf1

5

orf4

2,yc

f1,

ycf1

5

orf4

2,yc

f1or

f42,

ycf1

orf4

2,yc

f1or

f42,

ycf1

,tr

nG

orf4

2,yc

f1or

f42,

ycf1

Intr

on lo

sses

rps1

2rp

s12

rps1

2rp

s12

rps1

2rp

s12

rps1

2rp

s12

rps1

2

Page 5: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 5 of 17Peng et al. BMC Ecol Evo (2021) 21:71

was 72  bp larger than CSA. In the wild type, the aver-age length of the wild species was 156,923  bp, which was 194  bp and 105  bp variation compared with CSS and CSA, respectively. This showed that there was less length variation when comparing cultivated species with wild species (Table  1). Similarly, the number of genes and the GC content of cultivated species were more sta-ble than that of wild species. After comparing the genes and introns insertion or deletion among the Chinese cultivated type, Assam cultivated type and wild type, we found that introns of the rps12 gene were deleted in CSS and its two cultivated species. The orf42, ycf1 and ycf15 genes were deleted in CSA and CSAY. However, these events occurred randomly in wild species. The differ-ences in the GC content of the CDS, intron and IGS in the Chinese cultivated type and Assam cultivated type were approximately 0.01–0.03%, and 0–0.02%, respec-tively, but we found that the differences of the CDS, intron and IGS in the wild type were 0.02–1.05%.

mVISTA and Blast Ring Image Generator (BRIG) were used to compare the genomic sequence identity. Compar-ing CSS and CSA with their cultivated types, the regions with relatively low identity were psaA_ycf3, petL_petG and ycf1_ndhF. Comparing CSS and CSA with other wild types, the regions with relatively low identity were atpH_atpI, trnE-UCC _trnT-GGU , psaA_ycf3, ycf15_trnL-CAA , ycf1_ndhF and ndhG_ndhI (Figs. 2 and 3). In con-clusion, at the genomic level, the cultivated species were more conserved than the wild species.

The expansion and contraction of IR regionsThe locations of inverted repeat (IR) regions were extracted via a self-BLASTN search, and the character-istics of the IR/Large single copy region (LSC) and IR/Small single copy region (SSC) boundary regions were analyzed. The IRs boundary regions of the 17 complete Camellia cp genomes were compared, showing slight dif-ferences in junction positions (Fig. 4). In order to detect

Table 2 Information regarding the complete chloroplast genomes of the research species

1  The taxonomic classification of Camellia is based on Ming’s research [47]

Species Accession number Subgenus1 Section1 Types Sample location Location References

Camellia sinensis var. sinensis

KJ806281 Thea Thea Wild Yunnan Academy of Agri-cultural Science

Yunnan, China [66]

Camellia sinensis var. sinen-sis cv. Anhua

MH042531 Thea Thea Cultivar Hunan City University Hunan, China [14]

Camellia sinensis var. sinen-sis cv. Longjing43

KF562708 Thea Thea Cultivar Huajiachi campus of Zheji-ang University

Zhejiang, China [17]

Camellia sinensis var. assamica

MH394410 Thea Thea Wild Kunming Institute of Botany, Kunming

Yunnan, China [21]

Camellia sinensis var. assa-mica cv. Yunkang10

MH019307 Thea Thea Cultivar Menghai County Yunnan, China [67]

Camellia sinensis var. pubilimba

KJ806280 Thea Thea Wild Yunnan Academy of Agri-cultural Science

Yunnan, China [66]

Camellia grandibracteata NC024659 Thea Thea Wild Yunnan Academy of Agri-cultural Science

Yunnan, China [66]

Camellia taliensis NC022264 Thea Thea Wild Kunming Institute of Botany

Yunnan, China [7]

Camellia impressinervis NC022461 Thea Archecamellia Wild Kunming Institute of Botany

Yunnan, China [7]

Camellia pubicosta NC024662 Thea Corallina Wild International Camellia Species Garden

Zhejiang, China [66]

Camellia azalea NC035574 Camellia Camellia Wild Yangchun County Guangdong, China [19]

Camellia pitardii NC022462 Camellia Camellia Wild Kunming Institute of Botany

Yunnan, China [7]

Camellia reticulata NC024663 Camellia Camellia Wild Kunming Institute of Botany

Yunnan, China [66]

Camellia crapnelliana NC024541 Camellia Heterogenea Wild Kunming Botanical Garden Yunnan, China [20]

Camellia cuspidata NC022459 Thea Theopsis Wild Kunming Institute of Botany

Yunnan, China [7]

Camellia petelotii NC024661 Thea Archecamellia Wild International Camellia Species Garden

Zhejiang, China [66]

Camellia yunnanensis NC022463 Camellia Heterogenea Wild Kunming Institute of Botany

Yunnan, China [7]

Page 6: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 6 of 17Peng et al. BMC Ecol Evo (2021) 21:71

possible IR border polymorphisms, first of all, we com-pared the four IR boundaries of the Chinese cultivated type. No difference was found at the LSC/IRb or IRa/LSC border; meanwhile, only minor differences were dis-covered at the IRb/SSC and SSC/IRa borders. Next, we compared the four IR boundaries of the Assam cultivated type, and the results were similar. Then, we compared the cp genome boundaries of the wild type. The rps19 gene at the LSC/IRb boundary expanded 52  bp from the LSC region to the IRb side in CPU, while it stopped at 46 bp from the LSC region in the rest of the species. On the other side of the IRa/LSC boundary, the lengths of the spacers between the IRa/LSC junction and the rpl2 gene (in IRa) were 112  bp for CPU, while those of the rest of the species were all 106 bp. Consistently, in all

of the compared cp genomes, the ycf1 gene spanned the SSC/IRa region and the length of ycf1 ranged from 963 to 1069 bp in IRa. Remarkably, most species have an ycf1 pseudogene at the IRa/LSC junction, while this was not observed in CSA, CTA, CIM, CPI, CCR, CCU, or CYU. Similar to most plants, the ndhF gene involved in pho-tosynthesis was located in the SSC region. However, the ndhF gene was located at the IRb/SSC boundary of CRE, and there was a 35  bp overlap between ndhF gene and ψycf1gene.

Nucleotide diversityComparisons based on the nucleotide diversity (Pi) val-ues of the Chinese cultivated type, Assam cultivated type, and wild type were presented, including the intergeneric

Camellia sinensis var. sinensis

157117 bp

C.sinensis var. sinensis

C.sinensis var. sinensis cv. Anhua

C.sinensis var. sinensis cv. Longjing43

C.sinensis var. assamica

C.sinensis var. assamica cv. Yunkang10

C.sinensis var. pubilimba

C.grandibracteata

C.taliensis

C.impressinervis

C.pubicosta

C.azalea

C.pitardii

C.reticulata

C.crapnelliana

C.petelotii

C.yunnanensis

C.cuspidata

Fig. 2 The sequence identity of seventeen Camellia species. The inner circle is the reference genome. Next circles represent the sequence identity between C.sinensis var. sinensis and sixteen other species. The outermost circle corresponds to the protein-coding genes and intergenic spacer regions. Genes with clockwise arrows represent reverse strands, while genes with counterclockwise arrows represent forward strands

Page 7: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 7 of 17Peng et al. BMC Ecol Evo (2021) 21:71

regions (IGS), protein-coding genes and introns (Addi-tional file 1: Table S1, Fig. 5). In our study, the average Pi values for the genes, introns and IGS in wild type were approximately 6.6, 3.5 and 9.1 times that of the Chi-nese cultivated type. In addition, the Pi values for all regions in the Assam cultivated type were 0. Compar-ing Chinese cultivated type with wild type, the Pi values of most genes, introns and IGS in the wild species were higher than those of in the cultivated species. For exam-ple, rps12, petD, rps19, trnI-CAU_rpl23, trnI-CAU_ycf2, trnI-GAU_rrn16, clpP_intron, rps16_intron, and atpF_intron were highly variable in the wild species, but they were not variable in the three cultivated species. For the photosynthetic genes, except for ndhD, ndhF, ndhH and psbC, the Pi values of the photosynthetic genes of three cultivated tea were 0. The Pi values of these genes were smaller than that of the wild species. These results indi-cate that these genes and noncoding regions were more conserved among the cultivated species than among the wild species.

Furthermore, although the average Pi values of the cultivated species were lower, we still found that the Pi values of rps16, rps4, trnL-UAA _intron, rps4_trnT-UGU , ndhC_trnV-UAC , cemA_petA, rpl33_rps18, psbN_psbH, rpl36_infA, rpl14_rpl16, rps7_rps12, ndhG_ndhI, trnV-GAC _rps12, and rps12_rps7 in the Chinese cultivated type were higher than those in wild species, and these difference sequences were mainly located in the LSC region (Fig. 5).

Phylogenetic analysis of cultivated tea and wild teaWe constructed three phylogenetic trees of cultivated and wild tea, namely, the complete cp genomic tree (complete cp-Tree), all shared protein coding genes among all spe-cies tree (SCDS-Tree) and the ycf1 gene tree (ycf1-Tree) (Figs.  6, 7 and 8). All phylogenetic trees supported the hypothesis that the Thea subgenus could be divided into two clades: clade I, including CSS, CSSL, CSSA, CSA, CSAY, CGR, CPU and CSP, and clade II, including CPE CIM, CTA and CCU. Clade I was strongly supported,

0k 4k 8k 12k 16k 20k 24k 28k 32k 36k

trnH-GUG

psbA

trnK-UUU

matK

trnK-UUU

rps16

trnQ-UUG

psbKpsbI

trnS-GCU

trnG-GCC

trnR-UCUatpA atpFatpH atpI

rps2

rpoC2 rpoC1

rpoB

trnC-GCA

petNpsbMtrnD-GUC

trnY-GUAtrnE-UUC

trnT-GGU

psbD

psbC

trnS-UGA

psbZtrnG-UCC

trnfM-CAU

rps14psaB

40k 44k 48k 52k 56k 60k 64k 68k 72k 76k

psaB psaA ycf3trnS-GGA

rps4

trnT-UGU

trnL-UAA

trnF-GAA

ndhJ

ndhK

ndhCtrnV-UACtrnM-CAU atpE

atpBrbcL accD psaI

ycf4

cemA

petA

psbJ

psbL

psbF

psbE

petL

petG

trnW-CCAtrnP-UGG

psaJ

rpl33

rps18rpl20

rps12

clpP psbB

psbT

psbN

psbH

petB

80k 84k 88k 92k 96k 100k 104k 108k 112k 116k

petB

petD

rpoA

rps11

rpl36infArps8rpl14

rpl16 rps3

rpl22

rps19

rpl2

rpl23

trnI-CAU ycf2 ycf15

trnL-CAA

ndhB rps7

rps12

trnV-GAC

rrn16

trnI-GAU

trnA-UGCorf42

trnA-UGC rrn23

rrn4.5rrn5trnR-ACG

trnN-GUU

ycf1 ndhF rpl32

trnL-UAG

ccsA

120k 124k 128k 132k 136k 140k 144k 148k 152k 156k

ccsA

ndhD

psaC

ndhE

ndhG

ndhI

ndhA

ndhH

rps15 ycf1

trnN-GUUtrnR-ACG

rrn5

rrn4.5

rrn23trnA-UGC

orf42

trnA-UGC

trnI-GAU

rrn16

trnV-GAC

rps12

rps7

ndhB

trnL-CAA

ycf15 ycf2 trnI-CAU

rpl23rpl2

C.sinensis var.sinensis KJ806281C.sinensis var. sinensis cv. Anhua MH042531

C.sinensis var.sinensis cv. Longjing43 KF562708C.sinensis var.assamica MH394410

C.sinensis var.pubilimba KJ806280C.grandibracteata NC024659

C.taliensis NC022264C.impressinervis NC022461

C.pubicosta NC024662C.azale NC035574

C.pitardii NC022462C.reticulata NC024663

C.crapnelliana NC024541C.cuspidata NC022459

C.petelotii NC024661C.yunnanensis NC022463

C.sinensis var.assamica cv. Yunkang10 MH019307

100%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%

geneexonUTRCNS

100%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%

100%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%

100%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%50%

C.sinensis var.sinensis KJ806281C.sinensis var. sinensis cv. Anhua MH042531

C.sinensis var.sinensis cv. Longjing43 KF562708C.sinensis var.assamica MH394410

C.sinensis var.pubilimba KJ806280C.grandibracteata NC024659

C.taliensis NC022264C.impressinervis NC022461

C.pubicosta NC024662C.azale NC035574

C.pitardii NC022462C.reticulata NC024663

C.crapnelliana NC024541C.cuspidata NC022459

C.petelotii NC024661C.yunnanensis NC022463

C.sinensis var.assamica cv. Yunkang10 MH019307

C.sinensis var.sinensis KJ806281C.sinensis var. sinensis cv. Anhua MH042531

C.sinensis var.sinensis cv. Longjing43 KF562708C.sinensis var.assamica MH394410

C.sinensis var.pubilimba KJ806280C.grandibracteata NC024659

C.taliensis NC022264C.impressinervis NC022461

C.pubicosta NC024662C.azale NC035574

C.pitardii NC022462C.reticulata NC024663

C.crapnelliana NC024541C.cuspidata NC022459

C.petelotii NC024661C.yunnanensis NC022463

C.sinensis var.assamica cv. Yunkang10 MH019307

C.sinensis var.sinensis KJ806281C.sinensis var. sinensis cv. Anhua MH042531

C.sinensis var.sinensis cv. Longjing43 KF562708C.sinensis var.assamica MH394410

C.sinensis var.pubilimba KJ806280C.grandibracteata NC024659

C.taliensis NC022264C.impressinervis NC022461

C.pubicosta NC024662C.azale NC035574

C.pitardii NC022462C.reticulata NC024663

C.crapnelliana NC024541C.cuspidata NC022459

C.petelotii NC024661C.yunnanensis NC022463

C.sinensis var.assamica cv. Yunkang10 MH019307

Fig. 3 Alignment visualization of the seventeen Camellia chloroplast genome sequences using C.sinensis var. sinensis as a reference. The vertical scale indicates the percentage of identity, ranging from 50 to 100%. Arrows indicate the annotated genes and their transcriptional direction. The different colored boxes correspond to exons, tRNA or rRNA, and noncoding sequences (CNSs)

Page 8: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 8 of 17Peng et al. BMC Ecol Evo (2021) 21:71

LSC86586 bp

IRb26081 bp

SSC18277 bp

IRa26081 bp LSC

trn-N

Ψ ycf1

trn-H

trn-N

JSB( IRb-SSC)JLB(LSC-IRb) JSA( SSC-IRa) JLA( IRa-LSC)

46 bp233 bp

106 bp

4547 bp 1060 bp

1372 bp

rpl2

13 bp

1bp

LSC86642 bp

IRb26080 bp

SSC18283bp

IRa26080 bp LSC

trn-N

Ψ ycf1

trn-H

trn-N

46 bp233 bp4553 bp 1069 bp

rpl2

13 bp

1bp

Camellia sinensis var. sinensis cv. Anhua

Camellia sinensis var. sinensis cv. Longjing43

1381 bp 106 bp

LSC86632 bp

IRb26068 bp

SSC18260 bp

IRa26068 bp LSC

trn-N trn-H

trn-N

46 bp233 bp

106 bp

4585 bp 1043 bp

1355 bp

rpl2

1bp

Camellia sinensis var. assamica

LSC86678 bp

IRb26071 bp

SSC18266 bp

IRa26071 bp LSC

trn N trn H

trn N

52 bp233 bp

4579 bp 1037 bp

rpl2

1bp

1349 bp 112 bp

Ψ ycf1

9 bp

Camellia sinensis var. pubilimba

LSC86662 bp

IRb26090 bp

SSC18275 bp

IRa26090 bp LSC

trn-N trn-H

trn-N

46 bp233 bp 4553 bp 1069 bp

rpl2

1bp

1381 bp 106 bp

Ψ ycf1

13 bp

Camellia sinensis var. sinensis

LSC86656 bp

IRb26093 bp

SSC18285 bp

IRa26093 bp LSC

trn N trn H

trn N

46 bp233 bp 4559 bp 1069 bp

rpl2

1bp

1381bp 106 bp

Ψ ycf1

13 bp

Camellia grandibracteata

LSC86578 bp

IRb26041 bp

SSC18232 bp

IRa26041 bp

trn N trn H

trn N

46 bp233 bp4579 bp

1043 bp

rpl2

1bp

1355 bp 106 bp

LSCCamellia impressinervis

LSC86649 bp

IRb26074 bp

SSC18279 bp

IRa26074 bp LSC

trn N trn H

trn N

233 bp 4553 bp 1069 bp

rpl2

1bp

1375 bp 106 bp2 bp

46 bp

Camellia pubicosta

LSC86674 bp

IRb26042 bp

SSC18281 bp

IRa26042 bp

LSC

trn N trn H

trn N

233 bp 4553 bp 1069 bp

rpl2

1bp

1381 bp 106 bp2 bp

46 bp

Camellia azalea

LSC86212 bp

IRb26057 bp

SSC18259 bp

IRa26057 bp LSC

trn N trn H

trn N

233 bp 4573 bp 1043 bp

rpl2

1bp

1355 bp

46 bp

106 bp

Camellia pitardii

LSC86236bp

IRb26032 bp

SSC18318bp

IRa26006bp LSC

trn N trn H

trn N

233 bp4553 bp

1069 bp

rpl2

1bp46 bp

106 bp

1381 bp

Camellia yunnanensis

LSC86655 bp

IRb25968 bp

SSC18406 bp

IRa25968 bp LSC

trn N trn H

trn N

233 bp4659bp

963 bp

rpl2

1bp46 bp

Camellia crapnelliana

1275 bp 106 bp

LSC86294 bp

IRb26025 bp

SSC18302 bp

IRa25999 bp LSC

trn N trn H

trn N

233 bp4541 bp

1069 bp

rpl2

1bp46 bp

Camellia cuspidata

1341 bp 100 bp

LSC86659 bp

IRb26090 bp

SSC18282 bp

IRa26090 bp LSC

trn N trn H

trn N

233 bp 4553 bp 1069 bp

rpl2

1bp46 bp

Camellia petelotii

1381 bp 106 bp13 bp

LSC86572 bp

LSC86605 bp

IRb26066 bp

rps19

IRb25933 bp

SSC18234 bp

SSC18436 bp

IRa25933 bp

LSC

rps19

rps19

rps19

rps19

rps19

rps19

rps19

rps19

rps19

rps19

rps19

rps19

rps19

Ψ ycf1

Ψ ycf1

Ψ ycf1

ycf1

ycf1

ycf1

ycf1

ycf1

ycf1

ycf1

ycf1

ycf1

ycf1

ycf1

ycf1

ycf1

ycf1

trn N trn H

trn N

46 bp233 bp4541 bp 1069 bp

rpl2

1 bp

1381 bp 106 bp

LSCCamellia taliensisrps19 ycf1

IRa26066bp

trn N trn H

trn N

233 bp 4567 bp 1049 bp

rpl2

1bp

1361 bp

46 bp

106 bp

rps19 ycf1

Camellia reticulataΨ ycf1

4 bp

1355 bp

1381 bp

1355 bp

1355 bp

1275 bp

1341 bp

1381 bp

ndhF

39 bp 2241 bp

1381 bp

ndhF164 bp

1372 bp

ndhF

56 bp

1381 bp

ndhF

56 bp

ndhF

5 bp

1349 bpndhF

5 bp

1381 bpndhF

56 bp

ndhF56 bp

ndhF5 bp

1375 bpndhF56 bp

1381 bpndhF

56 bp

ndhF5 bp

1361 bp

ndhF

68bp

ndhF

56 bp

1341 bpndhF56 bp

ndhF56 bp

LSC86649 bp

IRb26083 bp

SSC18285 bp

IRa26083 bp LSC

trn N trn H

trn N

46 bp233 bp

106 bp

4559 bp 1068 bp

1381 bp

rpl2

1bp

Camellia sinensis var. assamica cv. Yunkang10rps19 ycf1

1382 bpndhF

56 bp

Fig. 4 Comparison of IR boundary regions among the 17 Camellia chloroplast genomes, using C. sinensis var. sinensis as the reference. Boxes above or below the line are forward strands and reverse strands, respectively

Page 9: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 9 of 17Peng et al. BMC Ecol Evo (2021) 21:71

because the posterior probabilities or bootstrap values obtained by neighbor-joining (NJ), maximum parsi-mony (MP), Bayesian inference (BI) and maximum likeli-hood (ML) were very high for each lineage. These results suggested that the seven species in clade I were closely related. All phylogenetic trees proved that CSS was the closest relative to CSSA and CSSL, and CSA was the clos-est relative to CSAY. In particular, in the ycf1-Tree, the posterior probabilities or bootstrap values of these spe-cies were lower than those of the complete cp-Tree and the SCDS-Tree. The value of CSSA was less than 50%. These results suggested that the ycf1 gene has diverged in cultivated tea.

In addition, we found conflict among the three trees (Figs. 6, 7 and 8). The topological structures consisting of the Camellia subgenus (CPI, CRE, CAZ, CCR, and CYU) and the Thea subgenus (CPE, CIM, CTA and CCU) were poorly supported by the complete cp-Tree, SCDS-Tree

and ycf1-Tree, because most bootstrap values or pos-terior probabilities were less than 50% for each lineage. These results may be caused by unbalanced sampling.

The cp-Tree showed some structural variations among the Camellia cp genomes (Fig. 6). The clade, which was made up of CSS, CSSL, CSSA, CSA, CSAY, CGR, CPU, CSP and CPE, was characterized by the rps12 intron deletion, the ψycf1 gene, and the ψycf15 gene (except for CSA and CSAY). The other species, except for CRE and CAZ, had lost the ψycf1 gene and the orf42 gene.

Chloroplast genome variation and evolution in cultivated teaTo explain the changes in the cp genome structure of the cultivated tea group, we detected single nucleotide polymorphism (SNP) and insertion/deletion (indel) in the cp genome of cultivated tea. In the Chinese culti-vated type, after comparing the whole cp genome of

psbA

matK

rps16

psbK psbI

atpA atpF

atpH atpI

rps2

rpoC

2rpoC

1rpoB

petN

psbM

psbD

psbC

psbZ

rps14

psaB

psaA ycf3

rps4

ndhJ

ndhK

ndhC

atpE

atpB

rbcL

accD psaI

ycf4

cemA

petA

psbJ

psbL

psbF

psbE

petL

petG

psaJ

rpl33

rps18

rpl20

rps12

clpP

psbB

psbT

psbN

psbH

petB

petD

rpoA

rps11

rpl36

infA

rps8

rpl14

rpl16

rps3

rpl22

rps19

rpl2

rpl23

ycf2

ndhB rps7

ycf1

ycf2_2

ndhF

rpl32

ccsA

ndhD

psaC

ndhE

ndhG ndhI

ndhA

ndhH

rps15

rps16_

intro

natpF

_intron

rpoC

1_intro

nycf3_intron

ycf3_intronII

trnL-UAA_intron

trnV-U

AC_intron

clpP

_intron

clpP

_intronII

rpl16_

intro

nrpl2_intron

ndhB

_intron

trnI-G

AU_intron

ndhA

_intron

Pi

trnH-G

UG_p

sbA

rps16_

trnQ-U

UG

trnQ-U

UG_p

sbK

psbK

_psbI

psbI_trnS-GCU

trnR-U

CU_atpA

atpA

_atpF

atpF

_atpH

atpH

_atpI

atpI_rps2

rps2_rpo

C2

rpoC

2_rpoC

1rpoC

1_rpoB

rpoB

_trnC-

GCA

trnC-G

CA_p

etN

petN

_psbM

psbM

_trnD-G

UC

trnD-G

UC_trnY-G

UA

trnY-G

UA_trnE-UUC

trnE-UUC_trnT-GGU

trnT-GGU_p

sbD

psbC

_trnS-UGA

trnS-UGA_p

sbZ

psbZ

_trnG-U

CC

trnG-U

CC_

trnfM

-CAU

trnfM

-CAU_rps14

rps14_

psaB

psaB

_psaA

psaA

_ycf3

ycf3_trnS-GGA

trnS-GGA_rps4

rps4_trnT-UGU

trnT-UGU_trnL-UAA

trnL-UAA_trnF-GAA

trnF-GAA_n

dhJ

ndhJ_n

dhK

ndhC

_trnV-U

AC

trnV-U

AC_trnM-C

AU

trnM-C

AU_atpE

atpB

_rbcL

rbcL

_accD

accD

_psaI

psaI_y

cf4

ycf4_cem

AcemA_p

etA

petA

_psbJ

psbJ_p

sbL

psbL

_psbF

psbF

_psbE

psbE

_petL

petL_p

etG

petG

_trnW-C

CA

trnW-CCA_trnP-UGG

psaJ_rpl33

rpl33_rps18

rps18_

rpl20

rpl20_rps12

rps12_

clpP

clpP

_psbB

psbB

_psbT

psbT

_psbN

psbN

_psbH

psbH

_petB

petB_p

etD

petD

_rpo

ArpoA

_rps11

rps11_

rpl36

rpl36_

infA

infA

_rps8

rps8_rpl14

rpl14_

rpl16

rpl16_

rps3

rpl22_rps19

rps19_

rpl2

rpl2_rpl23

rpl23_

trnI-CA

Utrn

I-CA

U_y

cf2

trnL-CAA_n

dhB

ndhB

_rps7

rps7_rps12

rps12_

trnV-G

AC

trnV-G

AC_rrn16

rrn1

6_trn

I-GAU

trnI-GAU_trnA-U

GC

trnA-U

GC_rrn23

rrn2

3_rrn4

.5rrn4

.5_rrn5

rrn5

_trnR-A

CG

trnR-A

CG_trnN-G

UU

ndhF

_rpl32

rpl32_

trnL-UAG

ccsA

_ndh

Dnd

hD_p

saC

psaC

_ndh

End

hE_n

dhG

ndhG

_ndh

Ind

hI_n

dhA

ndhA

_ndh

Hnd

hH_rps15

rps15_

ycf1

ycf1_trnN-G

UU

trnN

-GU

U_t

rnR

-AC

Gtrn

R-A

CG

_rrn

5rr

n5_r

rn4.

5rr

n4.5

_rrn

23rr

n23_

trnA

-UG

Ctrn

A-U

GC

_trn

I-G

AU

trnI-

GA

U_r

rn16

rrn1

6_trn

V-G

AC

trnV

-GA

C_r

ps12

rps1

2_rp

s7rp

s7_n

dhB

ndhB

_trn

L-C

AA

ycf2

_trn

I-C

AU

trnI-

CAU

_rpl

23rp

l23_

rpl2

rpl2

_trn

H-G

UG

Pi

Wild Type

Chinese cultivated typeAssam cultivated type

Wild Type

Chinese cultivated typeAssam cultivated type

a

b

Fig. 5 Comparative analysis of nucleotide variability (Pi) values among Chinese cultivated type, Assam cultivated type and wild type. X-axis: the names of protein-coding genes, introns or intergenic regions, Y-axis: nucleotide diversity of each window

Page 10: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 10 of 17Peng et al. BMC Ecol Evo (2021) 21:71

three species, 67 SNPs and 46 indels were found. The LSC, IRb, SSC and IRa regions contained 43, 3, 13, and 8 SNPs and 37, 2, 5, and 2 indels, respectively (Addi-tional file  2: Table  S2). Most of the SNPs and indels were located in the noncoding region (IGS and intron). There were 39 SNPs and 41 indels in this region, while 28 SNPs and 5 indels were found in the protein cod-ing region. The two ycf1 genes, which are located at the junction of SSC and IRa, contained the most SNPs and indels, 6 and 2, respectively. For the photosynthetic genes, psbC, ndhD, ndhF and ndhH presented SNP var-iations, while the psbI gene presented indel variation. For the 14 sequences with higher Pi values in cultivated species than in wild species, trnV-GAC_rps12 and ndhG_ndhI contained the most abundant SNPs, with 5 and 2 respectively (Fig. 5). In the Assam cultivated type, after comparing the whole cp genome of two species, 4 indels were found, but no SNPs. All indels were located in the IGS region. In particular, a long sequence (77 bp)

was inserted into the IRb/SSC boundary region (Addi-tional file 3: Table S3).

To have a clear view of the evolution of cultivated spe-cies, we used their 80 shared protein coding genes to cal-culate their nonsynonymous nucleotide substitution (Ka) rates, synonymous nucleotide substitution (Ks) rates and Ka/Ks ratio. First, we compared CSS and its cultivated species. The results showed that only 16 protein coding genes had synonymous or nonsynonymous mutations (Fig.  9, Additional file  4: Table  S4). Among them, there were nonsynonymous mutations in matK, rps16, rpoC2, rpoB, accD, clpP, rps8, ycf1, ndhD, ndhH and rps15. The genes with the highest rate of nonsynonymous mutations were rps16, rps8 and rps15. There were synonymous mutations in rpoB, psbC, rps4, ycf4, rpoA and ndhF. The highest mutation rates were rps4, ycf4 and rpoA. Of the 80 genes, 79 had a Ka / Ks value of 0, and only rpoB, had a Ka/Ks value of 0.3004 < 0.5, suggesting very strong puri-fying selective pressure. Then, we compared CSA and its

Camellia sinensis var. assamica

Camellia sinensis var. assamica cultivar Yunkang

Camellia grandibracteata

Camellia pubicosta

Camellia sinensis var. sinensis

Camellia sinensis cultivar Longjing

Camellia sinensis cultivar Anhua

Camellia sinensis var. pubilimba

Camellia petelotii

Camellia pitardii

Camellia reticulata

Camellia impressinervis

Camellia taliensis

Camellia cuspidata

Camellia azalea

Camellia crapnelliana

Camellia yunnanensis

Coffea arabica

Coffea canephora

rps12 intronorf42

Pseudo ycf1Pseudo ycf15

subgen. Thea

subgen. Thea

subgen. Camellia

subgen. Camellia

100/100/1/100

100/100/1/100

100/100/1/100

100/100/1/100

100/100/1/100

91/95/1/97

100/100/1/100

98/90/1/97

71/-/0.9/-

-/54/0.7/-

100/50/1/100

100/50/1/100

100/100/1/99

-/-/1/-

-/-/1/-

-/-/0.6/-

-/-/0.8/-

outgroup

Fig. 6 The phylogenetic tree of Camellia species based on the complete cp genomes (complete cp-Tree). Coffea canephora and Coffea arabica were selected as the outgroup. Tree were constructed by neighbor-joining (NJ), maximum parsimony (MP), Bayesian inference (BI) and maximum likelihood (ML) with bootstrap values or posterior probabilities above the branches, respectively. Bootstrap values less than 50% are represented by "-". As indicated in the legend at the top left, the unique genes and introns of each species were plotted onto branches using colored squares

Page 11: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 11 of 17Peng et al. BMC Ecol Evo (2021) 21:71

Camellia sinensis var. assamica

Camellia sinensis var. assamica cv. Yunkang10

Camellia grandibracteata

Camellia pubicosta

Camellia sinensis var. sinensis

Camellia sinensis var. sinensis cv. Longjing43

Camellia sinensis var. sinensis cv. Anhua

Camellia sinensis var. pubilimba

Camellia reticulata

Camellia petelotii

Camellia azalea

Camellia pitardii

Camellia cuspidata

Camellia taliensis

Camellia impressinervis

Camellia crapnelliana

Camellia yunnanensis

Coffea arabica

Coffea canephora

subgen. Thea

subgen. Thea

subgen. Camellia

subgen. Camellia

subgen. Camellia

100/100/1/100

100/100/1/100

100/100/1/100

90/100/1/100

100/98/1/99

100/100/1/100100/100/1/100

51/-/1/62

-/-/0.2/65-/-/0.2/54

100/50/1/100

100/50/1/100

-/-/0.5/-

-/-/0.3/51

-/-/1/64

-/-/0.3/-

-/-/0.5/-

outgroup

subgen. Thea

Fig.7 The phylogenetic tree of Camellia species based on the all shared coding protein genes among all species (SCDS-Tree). Coffea canephora and Coffea arabica were selected as the outgroup. Tree were constructed by neighbor-joining (NJ), maximum parsimony (MP), Bayesian inference (BI) and maximum likelihood (ML) with bootstrap values or posterior probabilities above the branches, respectively. The bootstrap values less than 50% are represented by "-"

Camellia sinensis var. assamica

Camellia sinensis var. assamica cv. Yunkang10

Camellia grandibracteata

Camellia pubicosta

Camellia sinensis var. sinensis

Camellia sinensis var. sinensis cv. Longjing43

Camellia sinensis var. sinensis cv. Anhua

Camellia sinensis var. pubilimba

Camellia taliensis

Camellia azalea

Camellia impressinervis

Camellia pitardii

Camellia reticulata

Camellia cuspidata

Camellia crapnelliana

Camellia petelotii

Camellia yunnanensis

Coffea arabica

Coffea canephora

subgen. Camellia

subgen. Camellia

subgen. Thea

subgen. Camellia

99/95/1/99

87/97/1/99

97/98/1/99

95/84/1/92-/-/0.8/-

-/-/0.7/-

61/63/0.7/-

-/-/0.9/-

-/-/0.3/-

-/-/0.6/-

-/-/0.5/--/-/0.5/-

-/-/0.2/-

-/-/0.1/-

-/-/0.5/-

-/50/1/100

100/50/1/100

subgen. Thea

subgen. Thea

subgen. Thea

subgen. Camellia

outgroup

subgen. Thea

Fig. 8 The phylogenetic tree of Camellia species based on the ycf1 gene (ycf1-Tree). Coffea canephora and Coffea arabica were selected as the outgroup. Tree were constructed by neighbor-joining (NJ), maximum parsimony (MP), Bayesian inference (BI) and maximum likelihood (ML) with bootstrap values or posterior probabilities above the branches, respectively. The bootstrap values less than 50% are represented by "-"

Page 12: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 12 of 17Peng et al. BMC Ecol Evo (2021) 21:71

cultivated species. However, no protein coding genes had synonymous or nonsynonymous mutations, suggesting very strong purifying selective pressure (Additional file 5: Table S5).

The site specific selection events of 16 genes with syn-onymous or non-synonymous mutations were analyzed by Bayesian Empirical Bayes (BEB), and we found that some amino acid sites of ycf1 and rps15 exhibited site-specific selection (Additional file  6: Table  S6). In ycf1, there were six sites under positive selection, and in rps15, there was one site under positive selection. For example, in the rps15 gene, the codon ACC (threonine) of CSS was mutated to AAC (asparagine) in two cultivated species.

DiscussionUnderstanding the genetic variation between cultivated and wild species is crucial for introducing interesting traits from wild species into cultivars [26]. Organelle genome sequencing has proven to be an effective way to resolve phylogenetic relationships among closely related species [27, 28]. Here, we constructed and compared the complete cpDNA genome sequences of three cultivars and fourteen wild species of Camellia. At the genomic level, cultivated species were more conserved than wild species, in terms of both architecture and linear sequence order (the length, genes number, genes arrangement,

and GC content) (Table 2, Figs. 2 and 3). For other land plant species, such as peanuts, cherries and radishes, the cp genome size and structure, as well as the gene content and order, are highly conserved among the cultivated and wild species [29–31].

We found that the IR regions of cultivated tea had expanded or contracted. The IR length of the CSSA and CSSL was approximately 20  bp smaller than that of the CSS, accounting for 32% of the difference in the complete genome length. The IR length of the CSAY was approxi-mately 30 bp larger than that of the CSA, accounting for 42% of the difference in the complete genome length (Fig. 4). In fact, the contraction and expansion of IRs is considered to be one of the important reasons for the cp genome length variation [32]. Further SNP and indel analysis showed that ycf1 and trnV-GAC _rps12 changed in the Chinese cultivated type, while trnN-GUU _ndhF and rrn5_trnR-ACG changed in the Assam cultivated type. In CSS and CSSL, a 9 bp sequence (TCC TTC TTC/GAA GAA GGA) was inserted into the ycf1 gene (Addi-tional file 2: Table S2). This is suggested that ycf1 is one of the important reasons for the expansion or contraction of the IRs of the Chinese cultivated type. The same results were also found in Zheng’s study [33]. He analyzed the cp genome length variation in 272 species and found that atpA, accD and ycf1 accounted for 13% of the difference

0

0.001

0.002

0.003

0.004

0.005

0.006

psbA

mat

Krp

s16

psbK psbI

atpA

atpF

atpH atpI

rps2

rpoC

2rp

oC1

rpoB

petN

psbM

psbD

psbC

psbZ

rps1

4ps

aBps

aA ycf3

rps4

ndhJ

ndhK

ndhC

atpE

atpB

rbcL

accD psaI

ycf4

cem

Ape

tAps

bJps

bLps

bFps

bE petL

petG

psaJ

rpl3

3rp

s18

rpl2

0rp

s12

clpP

psbB

psbT

psbN

psbH petB

petD

rpoA

rps1

1rp

l36

infA

rps8

rpl1

4rp

l16

rps3

rpl2

2rp

s19

rpl2

rpl2

3yc

f2nd

hBrp

s7yc

f1nd

hFrp

l32

ccsA

ndhD

psaC

ndhE

ndhG ndhI

ndhA

ndhH

rps1

5

Ka

C. sinensis var. sinensis cv. Anhua vs C. sinensis var. sinensisC. sinensis var. sinensis cv. Longjing43 vs C. sinensis var. sinensisC. sinensis var. sinensis cv. Anhua vs C. sinensis var. sinensis cv. Longjing43

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

psbA

mat

Krp

s16

psbK psbI

atpA

atpF

atpH atpI

rps2

rpoC

2rp

oC1

rpoB

petN

psbM

psbD

psbC

psbZ

rps1

4ps

aBps

aA ycf3

rps4

ndhJ

ndhK

ndhC

atpE

atpB

rbcL

accD psaI

ycf4

cem

Ape

tAps

bJps

bLps

bFps

bE petL

petG

psaJ

rpl3

3rp

s18

rpl2

0rp

s12

clpP

psbB

psbT

psbN

psbH petB

petD

rpoA

rps1

1rp

l36

infA

rps8

rpl1

4rp

l16

rps3

rpl2

2rp

s19

rpl2

rpl2

3yc

f2nd

hB rps7

ycf1

ndhF

rpl3

2cc

sAnd

hDps

aCnd

hEnd

hG ndhI

ndhA

ndhH

rps1

5

Ks

LSC IR SSC

a

b

Fig. 9 Nonsynonymous nucleotide substitution (Ka) and synonymous nucleotide substitution (Ks) of homologous protein-coding genes from C. sinensis var. sinensis, C. sinensis var. sinensis cv. Longjing43 and C. sinensis var. sinensis cv. Anhua

Page 13: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 13 of 17Peng et al. BMC Ecol Evo (2021) 21:71

in length. Therefore, ycf1, which is associated with plant survival, may play a key role in the cp genome size vari-ations of cultivated tea. In CSAY, a 77 bp sequence was inserted into the trnN-GUU_ndhF region (IRb/SSC boundary region) (Additional file 3: Table S3). This is the main reason for the expansion or contraction of the IRs of the Assam cultivated type.

In addition to the variations in genome size, there were also some nucleotide mutations in the cultivated spe-cies. In this study, the nucleotide diversity of cultivated tea was lower than that of wild tea (Fig. 5), but the unbal-anced sampling between the 14 wild tea and 3 cultivated tea may lead to nucleotide diversity difference of cpDNA fragments. The nucleotide diversity comparison of 358 cultivated rice and 54 wild rice also presented similar results [34]. Nevertheless, we found that the nucleotide diversity of 14 sequences in the Chinese cultivated tea was higher than that of wild tea (rps16, rps4, trnL-UAA _intron, rps4_trnT-UGU , ndhC_trnV-UAC , cemA_petA, rpl33_rps18, psbN_psbH, rpl36_infA, rpl14_rpl16, rps7_rps12, ndhG_ndhI, trnV-GAC _rps12, and rps12_rps7) (Fig. 5). These sequences suggested the vari-ations in the cp genomes of cultivated tea, and they are potential molecular markers for distinguishing Camellia species and for the phylogenetic analysis of Camellia.

Previous studies have proven that human interfer-ence had effects on the genetic structure, leaf nutrients and pollen morphology of Camellia. Yan et al. analyzed the genetic relationship of five semi-wild tea which due to lack of human management for a long time were stud-ied by using genome-wide SNP. They found that human interference will affect the genetic structure of tea. After the human interference stopped, the tea from five dif-ferent geographical regions could be divided into three different groups because of the absence of free pollina-tion [22]. Xiong et al. made comparative analyses of the nutrient content in the leaves of cultivated and wild C. nitidissima. They found that cultivated C. nitidissima had significantly higher contents of essential amino acids (26.05%) and total amino acids (33.27%) than wild C. nitidissima [23]. Shu et al. proved that there are obvious differences in pollen morphology and exine morphology between cultivated and wild species of Camellia [24]. Therefore, to explore specific evolutionary characteristics between cultivated tea and its wild relatives, we subse-quently performed evolutionary research on cultivated tea.

First, to have a clear view of the cp genomic adaptive evolution of cultivated tea, we performed evolution-ary analysis on the protein-coding sequences. The Ka/Ks ratio is very useful for measuring selective pressure at the protein level [35]. In the Chinese cultivated type, Ka/Ks value of 79 genes was 0, and only rpoB had a value of

0.3004. In addition, some amino acids of ycf1 and rps15 exhibited site-specific selection (Additional file 4: Tables S4 and Additional file  6: S6). rpoB is crucial for genetic information transmission, and it affects the transcrip-tion of DNA into RNA and the translation of RNA into protein. They were also found to be under selective pres-sure in beverage crops [13]. The rps15 gene has a func-tion in chloroplast ribosome subunits [35]. ycf1, encoding a component of the chloroplast’s inner envelope mem-brane protein translocon, is one of the largest plastid genes [13], and it is also essential for almost all plant lineages [36]. These positively selected genes may have played key roles in the adaptation of cultivated tea to var-ious environments.

Generally, the deletion or insertion of amino acids in the encoded protein will affect the structure and func-tion of this gene [37–39]. In the Chinese cultivated type, 16 protein coding genes had nucleotide substitutions, among which the ycf1 gene had the largest number of nucleotide substitution. At the same time, in ycf1, five amino acid sites exhibited site-specific selection, and a 9 bp sequence insertion was found in CSSA (Additional file 4: Table S4 and Additional file 6: S6, Fig. 9).

ycf1 has an open reading frame of unknown function, but some studies have inferred that ycf1 is very important for plant survival [33, 40]. In tobacco, a chimeric gene conferring resistance to aminoglycoside antibiotics has been transferred into ycf1 in the cp genome. Then, the plantlets were cultured in plant regeneration medium containing the antibiotic spectinomycin. After that, the maintenance of a fairly constant ratio of wild-type ver-sus transformed genome copies was found. However, the wild-type genome was still present in all samples whereas the transplastomic fragments were missing from several samples after culturing in antibiotic-free medium. This experiment proved that ycf1 encodes products that are essential for cell survival. ycf1 is also an important molec-ular marker of plants [41, 42], because it has higher vari-ability than other known cp molecular markers (such as the widely used rbcL and matk genes), for both the total number of parsimony informative characters and the percent variability.

Phylogenetic analysis of cultivated and wild tea showed that CSSA and CSSL were closely related to the CSS, and CSAY was closely related to CSA (Figs. 6 and 7), which supports the previous finding that most of the cultivated tea originated directly from CSS and CSA [43]. However, in the ycf1-Tree, the posterior probabilities or bootstrap values of the cultivated tea branch were lower than that of the complete cp-Tree and the SCDS-Tree, which sug-gested that the ycf1 gene has diverged in cultivated tea (Figs. 6, 7 and 8). Similar results have been found in Cory-lus [44]. The ycf1 gene of Corylus chinensis and Corylus

Page 14: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 14 of 17Peng et al. BMC Ecol Evo (2021) 21:71

avellana have a similar evolutionary history, which is dif-ferent from that of Corylus heterophylla. This evolution of cultivated plants may be related to the utilization effi-ciency of photosynthesis. Photosystem biogenesis regu-lator 1 (PBR1), the RNA binding protein encoded by the nuclear genome, can improve the translation efficiency of ycf1 in the Arabidopsis thaliana cp genome. Addition-ally, the symbiosis and stability maintenance of the three photosynthetic complexes are regulated [45]. However, at present, the effect of mutations in the single amino acid site and the insertion or deletion of the short sequence on the function of ycf1 is still not clear, and cultivated tea may provide important materials for this kind of research.

In the phylogenetic trees, CSS, CSA, CGR and CPU formed a monophyletic clade with 100% bootstrap val-ues. CSS, CSA and CGR were classified into the sect. Thea, but CPU was classified into the sect. Corallina (Table 2). This indicates that CPU and sect. Thea plants have close genetic relationship. It also supports the result of Huang’s research [18]. However, CTA belongs to sect. Thea, together with two species of sect. Archecamel-lia and one species of sect. Theopsis that were located in another clade, which indicates that the phylogenetic direction of CTA is different from that of the other sect. Thea species. CTA is often considered to be a wild rela-tive of cultivated tea [43]. Both are monoecious, insect-pollinated and outcrossing species. However, there are differences in their morphological characters. For exam-ple, CTA has the features of 5-locule ovaries and large sepals and petals, whereas CSS has features of 3-locule ovaries and small sepals and petals [46, 47]. Based on the evidence of the chloroplast genome, we hypothesized that CTA and CSS have different genetic polymorphism. In this study, CIM and CPE were not clustered into the same branch. The taxonomy of CIM is controversial. CIM and CPE were classified into the sect. Archecamel-lia by Ming et al. [47], while Chang et al. [46] classified CIM into the sect. Chrysantha. Therefore, we infer that it is not acceptable to combine the sect. Archecamellia and the sect. Chrysantha. In the subgenus Camellia, CPI and CRE formed a clade, as did CAZ and CCR, and the bootstrap value was 97–100%. Among them, CPI, CRE and CAZ are all sect. Camellia plants, while CCR is clas-sified into sect. Heterogenea [47] or sect. Furfuracea [46]. However, both morphological and molecular characteris-tics indicate that CCR is closely related to some plants in sect. Camellia [48].

ConclusionIn this work, the complete cp genomes of three culti-vated species and 14 wild species of Camellia were stud-ied. Genomic variation and evolutionary processes were

compared in these species. Genomic variation analyses showed that the cultivated species were more conserved than the wild species in terms of architecture and linear sequence order. In the Assam cultivated type, the varia-tion in the chloroplast genome was mainly manifested by sequence insertion of IGS regions. In the Chinese culti-vated type, the variation in the chloroplast genome was mainly manifested by the nucleotide polymorphism and sequence insertion of some sequences. These nucleotide polymorphisms also led to the mutation of amino acid sites in some genes, among which ycf1 was the gene with the most mutation sites. In addition to amino acid muta-tions, there was a 9  bp base insertion in the ycf1 gene. ycf1 is believed to be a critical gene for plant survival, and it may influence photosynthesis and be related to plant adaptation. Evolutionary processes analyses showed that CSA and its cultivated species were tightly clustered, while CSS and its cultivated species were not tightly clus-tered. The evolutionary relationship between CSS and CSSL was closer than that with CSSA in the ycf1-Tree. However, at present, the effect of the mutation in the sin-gle amino acid site and insertion or deletion of the short sequence on the function of ycf1 are still not clear, and cultivated tea may provide important materials for this kind of research.

MethodsGenomic materials collection of cultivated teaThe complete cp genome of CSSA has been presented and annotated in our previous study [14] with GenBank accession number MH042531. Meanwhile, we searched in the National Center for Biotechnology Information (NCBI) dataset to find the published cultivated tea’s complete cp genomes, and only CSSL and CSAY with accession numbers KF562708 and MH019307 have been published [17]. Gene map of the three cultivated tea was generated using BRIG [49].

Comparative analysis between cultivated tea and wild teaThe Basic Local Alignment Search Tool (BLAST) was used to find closely related cp genomes of CSSA in NCBI. After the cp genome of Camellia was screened, 17 Camellia cp genomes with sampling information remained, including 3 cultivated species (CSSA, CSSL and CSAY) and 14 wild species (Table 2). Previous stud-ies have shown that both CSSA and CSSL originated directly from CSS, while CSAY originated directly from CSA [43, 49]. Therefore, we used CSS and CSA as the reference sequence to study the genomic variations and evolution direction between cultivated tea and wild tea.

Three methods were used for comparative genomic analysis: (I) The comparison of the cp genomic sequence identity was based on the method of Li [50] using

Page 15: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 15 of 17Peng et al. BMC Ecol Evo (2021) 21:71

mVISTA in Shuffle-LAGAN mode and BRIG, respec-tively. (II) The comparison of the expansion and contrac-tion of IR regions was presented. First, we annotated and extracted the IR boundary of the Camellia cp genomes by Plastid Genome Annotator (PGA) [51]. Then, the IR boundary regions were visualized by using Visio profes-sional 2016. (III) Comparisons based on the Pi values of the Chinese cultivated type, Assam cultivated type, and wild type were performed according to the method of Njuguna [52]. First, we used annotation information to extract intergenic regions, protein coding genes and intron regions of 17 Camellia species in Tbtools v0.6666 [53]. After comparing these sequences, 211 loci shared among Camellia species were found, including 80 pro-tein coding genes, 117 intergenic regions, and 14 intron regions. Each loci was divided into three datasets: (I) the sequences consisted of the Chinese cultivated type, (II) the sequences consisted of the Assam cultivated type; (III) the sequences consisted of wild type. Each sequence was aligned using clustal alignment with default set-tings in MEGA7.0 [54]. The Pi of these regions was cal-culated using DnaSP v6.10.04 [55] to show divergence at sequence level.

Phylogenetic analysis of CamelliaThree datasets were used to construct the following phy-logenetic trees of Camellia: (I) the complete cp genomes, (II) the all shared protein coding genes among all species (SCDS), and (III) ycf1 gene sequences. First, all datasets were aligned using MAFFT v7.380 [56] under the FFT-NS-2 default setting. The alignments were used for phy-logenetic analysis. After that, according to the method described by Xie et  al. [57] and Zhang et  al. [58], we used four methods to construct phylogenetic trees: NJ method, MP method, BI method and ML method. Cof-fea canephora and Coffea arabica were selected as the outgroup.

The NJ analysis was reconstructed via MEGA7.0 [54] under the default settings with 1000 bootstrap values. The MP analysis was performed in PAUP 4.0a167 [59] with heuristic searches with 1000 bootstrap replicates. The BI analysis was performed with Mrbayes 3.2.7 [60] under the best substitution models and parameters. The analysis parameters were set as four chains that were run simultaneously for 10,000,000 generations or until the average standard deviation of the split frequencies fell below 0.01. The best substitution models and parameters were computed by jmodeltest 2.1.7 [61]. The ML analy-sis was carried out in IQ-TREE [62] using the default set-tings, with 1000 bootstrap values for tree evaluation. The best substitution models were computed by IQ-TREE. All the best substitution models mentioned earlier were listed in Additional file 7: Table S7.

Evolutionary analysis of cultivated teaAfter alignment of the cultivated and wild species, the number and position of SNPs and indels in the genomes were presented in DnaSP v6.10.04 according to the Wu’s method [63].

The Ka and Ks rates as well as the Ka/Ks ratio in the homologous protein-coding genes were used to evalu-ate the adaptive evolution of the cultivated species. After aligning each gene using the ClustalW (Codons) program in MEGA7.0, the Ks, Ka and Ka/Ks values of each gene were determined according to Dong’s method [64] with the program from the PAML package [65]. For identifica-tion of site-specific selection, four models, M1 (neutral), M2 (selection), M7 (beta) and M8 (beta & ω), were used in codeml from the PAML package. The BEB was used to calculate the posterior probabilities for site classes. Only sites with posterior probabilities > 0.9 were selected.

AbbreviationsBEB: Bayesian Empirical Bayes; BI: The Bayesian inference; BRIG: Blast Ring Image Generator; CAZ: Camellia azalea; CCR : Camellia crapnelliana; CCU : Camellia cuspidate; CDS: Protein-coding regions; CGR : Camellia grandibrac-teata; CIM: Camellia impressinervis; cp: Chloroplast; CPE: Camellia petelotii; CPI: Camellia pitardii; CPU: Camellia pubicosta; CRE: Camellia reticulate; CSA: Camellia sinensis var. assamica; CSSA: Camellia sinensis var. sinensis cv. Anhua; CSSL: Camellia sinensis var. sinensis cv. Longjing43; CSP: Camellia sinensis var. pubilimba; CSS: Camellia sinensis var. sinensis; CTA : Camellia taliensis; CYU : Camellia yunnanensis; IGS: Intergeneric regions; Indel: Insertion/deletion; IR: Inverted repeat; Ka: Nonsynonymous nucleotide substitution; Ks: Synonymous nucleotide substitution; LSC: Large single copy region; ML: The maximum like-lihood; MP: The maximum parsimony; NCBI: National Center for Biotechnology Information; NJ: The neighbor-joining; PBR1: Photosystem biogenesis regula-tor 1; PGA: Plastid Genome Annotator; Pi: Nucleotide diversity; SNP: Single nucleotide polymorphism; SSC: Small single copy region.

Supplementary InformationThe online version contains supplementary material available at https:// doi. org/ 10. 1186/ s12862- 021- 01800-1.

Additional file 1: Table S1. Comparative analysis of nucleotide variability (Pi) values among the Chinese cultivated type, Assam cultivated type and wild type

Additional file 2: Table S2. Single nucleotide polymorphism (SNP) and insertion/deletion (indel) information from comparisons among C. sinensis var. sinensis, C. sinensis var. sinensis cv. Longjing43 and C. sinensis var. sinensis cv. Anhua

Additional file 3: Table S3. Single nucleotide polymorphism (SNP) and insertion/deletion (indel) information from comparisons between C. sinen-sis var. assamica and C. sinensis var. sinensis assamica cv. Yunkang10

Additional file 4: Table S4. Nonsynonymous nucleotide substitution (Ka) and synonymous nucleotide substitution (Ks) rates, as well as the Ka/Ks ratio of homologous protein-coding genes from C. sinensis var. sinensis, C. sinensis var. sinensis cv. Longjing43 and C. sinensis var. sinensis cv. Anhua

Additional file 5: Table S5. Nonsynonymous nucleotide substitution (Ka) and synonymous nucleotide substitution (Ks) rates, as well as the Ka/Ks ratio of homologous protein-coding genes from C. sinensis var. assamica and C. sinensis var. sinensis assamica cv. Yunkang10

Additional file 6: Table S6. Positive selection sites identified among 16 genes with synonymous or nonsynonymous mutations

Page 16: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 16 of 17Peng et al. BMC Ecol Evo (2021) 21:71

Additional file 7: Table S7. The best substitution models in the phyloge-netic analysis of Camellia

AcknowledgementsWe sincerely appreciate Dr. Huang Hui—from Kunming Institute of Botany—for providing us with the samples collection information of Camellia species. We also thank Chen Yi for the help during the analysis process.

Authors’ contributionsJP and ZXG conceived the study. All authors collected field samples. MD, SLQ, ZHY, XZF analyzed the final data. YZL and ZXG acquired funds for this study. JP wrote the original manuscript, and all authors have read and approved the manuscript.

FundingThis study was supported by the Key Projects of National Forestry and Grassland Bureau (201801), Forestry Science and Technology Project of Hunan Province (XLK201920), Natural Science Foundation of Hunan Province (2019JJ50027), Postgraduate Scientific Research Innovation Project of Hunan Province (CX20200711) and Scientific Innovation Fund for Post-graduates of Central South University of Forestry and Technology (CX20201010). The fund-ing bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Availability of data and materialsRaw sequences data of CSSA were submitted to National Center for Biotech-nology Information (NCBI) database with accession number MH042531. Other genomic data mentioned in the article can be accessed from NCBI and the details of accession number has been provided in Table 2.

Declarations

Ethics approval and consent to participateNot applicable.

Consent for publicationNot applicable.

Competing interestsThe authors declare that they have no competing interests.

Author details1 Hunan Research Center of Engineering Technology for Utilization of Environ-mental and Resources Plant, Central South University of Forestry and Technol-ogy, Changsha 410004, Hunan, People’s Republic of China. 2 Hunan Provincial Key Lab of Dark Tea and Jin-Hua, Hunan City University, Yiyang 413000, Hunan, People’s Republic of China. 3 Key Laboratory of National Forestry and Grass-land Administration on Management of Western Forest Bio-Disaster, College of Forestry, Northwest A & F University, Yangling 712100, Shaanxi, People’s Republic of China.

Received: 7 February 2020 Accepted: 22 April 2021

References 1. Zohary D. Domestication of crop plants. In: Levin SA, editor. Encyclopedia

of biodiversity. New York: Elsevier; 2001. p. 217–27. 2. Denison RF, Kiers ET, West SA. Darwinian agriculture: when can humans

find solutions beyond the reach of natural selection? Q Rev Biol. 2003;78(2):145–68.

3. Matesanz S, Gianoli E, Valladares F. Global change and the evolution of phenotypic plasticity in plants. Ann N Y Acad Sci. 2010;1206:35–55.

4. Kuniyuki Saitoh KN. Toshiro Kuroda: characteristics of flowering and pod set in wild and cultivated types of soybean. Plant Prod Sci. 2004;7(2):172–7.

5. Aharoni A, Giri AP, Verstappen FWA, Bertea CM, Sevenier R, Sun Z, Jongsma MA, Schwab W, Bouwmeester HJ. Gain and loss of fruit flavor compounds produced by wild and cultivated strawberry species. Plant Cell. 2004;16(11):3110–31.

6. Vijayan K, Zhang W-J, Tsou C-H. Molecular taxonomy of Camellia (Theaceae) inferred from nrITS sequences. Am J Bot. 2009;96(7):1348–60.

7. Yang J-B, Yang S-X, Li H-T, Yang J, Li D-Z. Comparative chloroplast genomes of Camellia species. PLoS ONE. 2013;8(8):e73053.

8. Gao JY, Clifford RP, Du YQ. Collected species of the genus Camellia—an illustrated outline. Zhejiang: Zhejiang Science and Technology Press; 2005.

9. Moore MJ, Soltis PS, Bell CD, Burleigh JG, Soltis DE. Phylogenetic analysis of 83 plastid genes further resolves the early diversification of eudicots. Proc Natl Acad Sci USA. 2010;107(10):4623–8.

10. Wachira F, Tanaka J, Takeda Y. Genetic variation and differentiation in tea (Camellia sinensis) germplasm revealed by RAPD and AFLP variation. J Hortic Sci Biotechnol. 2001;76(5):557–63.

11. Jianfei Z. Flora of China, vol. 12. Beijing: Flora of China Editorial Commit-tee; 2007.

12. Ming TL. A systematic synopsis of the genus Camellia. Acta Bot Yun-nanica. 1999;21(2):3–5.

13. Wei C, Yang H, Wang S, Zhao J, Liu C, Gao L, Xia E, Lu Y, Tai Y, She G, et al. Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality. Proc Natl Acad Sci USA. 2018;115(18):E4151–8.

14. Dong M, Liu S, Xu Z, Hu Z, Ku W, Wu L. The complete chloroplast genome of an economic plant, Camellia sinensis cultivar Anhua, China. Mitochon-drial DNA B. 2018;3(2):558–9.

15. Chen SH, Deng YS, Gong ZH, Zhu HY. Analysis of the evolution of technological innovation model of Anhua Dark Tea Industry. J Agric. 2015;5(6):96–101.

16. Daniell H, Lin CS, Yu M, Chang WJ. Chloroplast genomes: diversity, evolu-tion, and applications in genetic engineering. Genome Biol. 2016;17:134.

17. Ye XQ, Zhao ZH, Zhu QW, Wang YY, Lin ZX, Ye CY, Fan LJ, Xu HR. Entire chloroplast genome sequence of tea (Camellia sinensis cv. Longjing 43): a molecular phylogenetic analysis. J Zhejiang Univ. 2014;40(4):404–12.

18. Huang H, Shi C, Liu Y, Mao S-Y, Gao L-Z. Thirteen Camellia chloroplast genome sequences determined by high-throughput sequencing: genome structure and phylogenetic relationships. BMC Evol Biol. 2014;14:151.

19. Xu X, Zheng W, Wen J. The complete chloroplast genome of the long blooming and critically endangered Camellia azalea. Conserv Genet Resour. 2018;10(1):5–7.

20. Yang JB, Li DZ, Li HT. Highly effective sequencing whole chloroplast genomes of angiosperms by nine novel universal primer pairs. Mol Ecol Resour. 2014;14(5):1024–31.

21. Zeng CX, Hollingsworth PM, Yang J, He ZS, Zhang ZR, Li DZ, Yang JB. Genome skimming herbarium specimens for DNA barcoding and phy-logenomics. Plant Methods. 2018;14:43.

22. Yan G, Da-he Q, Chun Y, Yan L, Zheng-wu C, Juan C. Genetic diversity of old tea plant resources in Jiuan City of Guizhou Province, using genome-wide SNP. J Plant Genet Resour. 2019. https:// doi. org/ 10. 1186/ s12870- 019- 1917-5.

23. Xiong Z, Qi X, Wei X, Chen Z, Tang H, Chai S. Nutrient composi-tion in leaves of cultivated and wild Camellia nitidissima. Pak J Bot. 2012;44(2):635–8.

24. Shu JL, Chen L, Wang HS, Wang PS, Xu M, Song WX. Pollen morphology, ultrastructure and evolution of tea plant and other genus Camellia plants. J Tea Sci. 1998;18(1):6–15.

25. Zhao DW, Yang JB, Yang SX, Kato K, Luo JP. Genetic diversity and domes-tication origin of tea plant Camellia taliensis (Theaceae) as revealed by microsatellite markers. BMC Plant Biol. 2014;14:14.

26. Amar MH, Magdy M, Wang L, Zhou H, Zheng B, Jiang X, Atta AH, Han Y. Peach chloroplast genome variation architecture and phylog-enomic signatures of cpDNA introgression in Prunus. Can J Plant Sci. 2019;99(6):885–96.

27. Ivanova Z, Sablok G, Daskalova E, Zahmanova G, Apostolova E, Yahubyan G, Baev V. Chloroplast genome analysis of resurrection tertiary relict Haberlea rhodopensis highlights genes important for desiccation stress response. Front Plant Sci. 2017;8:204.

Page 17: Exploring the evolutionary characteristics between cultivated tea … · 2021. 4. 30. · tea [14]. e quality of dark tea products is related to the abundant cultivars, germplasm

Page 17 of 17Peng et al. BMC Ecol Evo (2021) 21:71

28. Ma PF, Zhang YX, Zeng CX, Guo ZH, Li DZ. Chloroplast phylogenomic analyses resolve deep-level relationships of an intractable bamboo tribe Arundinarieae (Poaceae). Syst Biol. 2014;63(6):933–50.

29. Cho M-S, Yoon HS, Kim S-C. Complete chloroplast genome of cultivated flowering cherry, Prunus xyedoensis “Somei-yoshino” in comparison with wild Prunus yedoensis Matsum. (Rosaceae). Mol Breed. 2018;38(9):112.

30. Wang J, Li Y, Li CJ, Yan CX, Zhao XB, Yuan CL, Sun QX, Shi CR, Shan SH. Twelve complete chloroplast genomes of wild peanuts: great genetic resources and a better understanding of Arachis phylogeny. BMC Plant Biol. 2019;19(1):504.

31. Yamagishi H, Terachi T, Ozaki A, Ishibashi A. Inter- and intraspecific sequence variations of the chloroplast genome in wild and cultivated Raphanus. Plant Breed. 2009;128(2):172–7.

32. Palmer JD, Nugent JM, Herbon LA. Unusual structure of geranium chlo-roplast DNA: a triple-sized inverted repeat, extensive gene duplications, multiple inversions, and two repeat families. Proc Natl Acad Sci USA. 1987;84(3):769–73.

33. Zheng XM, Wang JR, Feng L, Liu S, Pang HB, Qi L, Li J, Sun Y, Qiao WH, Zhang LF, Chen YL, Yang QW. Inferring the evolutionary mechanism of the chloroplast genome size by comparing whole-chloroplast genome sequences in seed plants. Sci Rep. 2017;7(1):1555.

34. Cheng L, Nam J, Chu SH, Rungnapa P, Min MH, Cao Y, Yoo JM, Kang JS, Kim KW, Park YJ. Signatures of differential selection in chloroplast genome between japonica and indica. Rice. 2019;12(1):65.

35. Gao CM, Deng YF, Wang J. The complete chloroplast genomes of Echina-canthus species (Acanthaceae): phylogenetic relationships, adaptive evo-lution, and screening of molecular markers. Front Plant Sci. 2019;9:1989.

36. Dong WL, Wang RN, Zhang NY, Fan WB, Fang MF, Li ZH. Molecular evolu-tion of chloroplast genomes of Orchid species: insights into phylogenetic relationship and adaptive evolution. Int J Mol Sci. 2018;19(3):716.

37. Cai Z, Guisinger M, Kim HG, Ruck E, Blazier JC, McMurtry V, Kuehl JV, Boore J, Jansen RK. Extensive reorganization of the plastid genome of Trifolium subterraneum (Fabaceae) is associated with numerous repeated sequences and novel DNA insertions. J Mol Evol. 2008;67(6):696–704.

38. Dugas DV, Hernandez D, Koenen EJM, Schwarz E, Straub S, Hughes CE, Jansen RK, Nageswara-Rao M, Staats M, Trujillo JT, et al. Mimosoid legume plastome evolution: IR expansion, tandem repeat expansions, and accel-erated rate of evolution in clpP. Sci Rep. 2015;5:16958.

39. Wu Y, Liu F, Yang DG, Li W, Zhou XJ, Pei XY, Liu YG, He KL, Zhang WS, Ren ZY, Zhou KH, Ma XF, Li ZH. Comparative chloroplast genomics of Gos-sypium species: insights into repeat sequence variations and phylogeny. Front Plant Sci. 2018;9:367.

40. Drescher A, Ruf S, Calsa T Jr, Carrer H, Bock R. The two largest chloroplast genome-encoded open reading frames of higher plants are essential genes. Plant J. 2000;22(2):97–104.

41. Dong WP, Xu C, Li CH, Sun JH, Zuo YJ, Shi S, Cheng T, Guo JJ, Zhou SL. ycf1, the most promising plastid DNA barcode of land plants. Sci Rep. 2015;5:8348.

42. Neubig KM, Whitten WM, Carlsward BS, Blanco MA, Endara L, Williams NH, Moore M. Phylogenetic utility of ycf1 in orchids: a plastid gene more variable than matK. Plant Syst Evol. 2009;277(1–2):75–84.

43. Liu Y, Yang SX, Ji PZ, Gao LZ. Phylogeography of Camellia taliensis (Theaceae) inferred from chloroplast and nuclear DNA: insights into evolutionary history and conservation. BMC Evol Biol. 2012;12:92.

44. Wei YL, Wen ZF, Liu F, Zhang JW, Huang WG, Lan YP, Cheng LL, Cao QC, Hu GL. Bioinformatics analysis of ycf1 gene in Corylus. J Shanxi Agric Sci. 2018;46(08):1244–7, 1333.

45. Yang XF, Wang YT, Chen ST, Li JK, Shen HT, Guo FQ. PBR1 selectively controls biogenesis of photosynthetic complexes by modulating translation of the large chloroplast gene Ycf1 in Arabidopsis. Cell Discov. 2016;2:16003.

46. Zhang HD. Thea—a section of beveragial tea trees of the genus Camellia. Acta Sci Nat Univ Sunyatseni. 1981;1:87–99.

47. Ming TL. A revision of Camellia sect. Thea. Acta Bot Yunnanica. 1992;14(2):115–32.

48. Jiang ZD. Preliminary study of molecular phylogenetics and biogeogra-phy of the genus Camellia L. based on chloroplast DNA. 2017; Master’s Thesis, Zhejiang SCI-TECH University, Hangzhou.

49. Alikhan N-F, Petty NK, Ben Zakour NL, Beatson SA. BLAST Ring Image Gen-erator (BRIG): simple prokaryote genome comparisons. BMC Genomics. 2011. https:// doi. org/ 10. 1186/ 1471- 2164- 12- 402.

50. Li C, Zhao Y, Xu Z, Yang G, Peng J, Peng XY. Initial characterization of the chloroplast genome of Vicia sepium, an important wild resource plant, and related inferences about its evolution. Front Plant Sci. 2020;11:73.

51. Qu XJ, Moore MJ, Li DZ, Yi TS. PGA: a software package for rapid, accurate, and flexible batch annotation of plastomes. Plant Methods. 2019;15:12.

52. Njuguna AW, Li Z-Z, Saina JK, Munywoki JM, Gichira AW, Gituru RW, Wang Q-F, Chen J-M. Comparative analyses of the complete chloroplast genomes of nymphoides and menyanthes species (menyanthaceae). Aquat Bot. 2019;156:73–81.

53. Chen C, Chen H, Zhang Y, Thomas HR, Frank MH, He Y, Xia R. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Mol Plant. 2020;13(8):1194–202.

54. Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33(7):1870–4.

55. Rozas J, Ferrer-Mata A, Sánchez-DelBarrio JC, Guirao-Rico S, Librado P, Ramos-Onsins SE, Sánchez-Gracia A. DnaSP 6: DNA sequence polymor-phism analysis of large data sets. Mol Biol Evol. 2017;34(12):3299–302.

56. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80.

57. Xie DF, Yu Y, Deng YQ, Li J, Liu HY, Zhou SD, He XJ. Comparative analysis of the chloroplast genomes of the Chinese endemic genus Urophysa and their contribution to chloroplast phylogeny and adaptive evolution. Int J Mol Sci. 2018;19(7):1847.

58. Zhang Y-b, Yuan Y, Pang Y-x, Yu F-l, Yuan C, Wang D, Hu X. Phyloge-netic reconstruction and divergence time estimation of Blumea DC. (Asteraceae: Inuleae) in China based on nrDNA ITS and cpDNA trnL-F sequences. Plants. 2019;8(7):210.

59. Swofford D. PAUP*; Version 4. Sunderland, MA: Sinauer Associates; 2003. 60. Huelsenbeck JP, Ronquist F. MRBAYES: Bayesian inference of phylogenetic

trees. Bioinformatics. 2001;17(8):754–5. 61. Darriba D, Taboada GL, Doallo R, Posada D. jModelTest 2: more models,

new heuristics and parallel computing. Nat Methods. 2012;9(8):772–772. 62. Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and

effective stochastic algorithm for estimating maximum-likelihood phy-logenies. Mol Biol Evol. 2014;32(1):268–74.

63. Wu Z, Gu C, Tembrock LR, Zhang D, Ge S. Characterization of the whole chloroplast genome of Chikusichloa mutica and its comparison with other rice tribe (Oryzeae) species. PLoS ONE. 2017;12(5):e0177553.

64. Dong M, Zhou XM, Ku WZ, Xu ZG. Detecting useful genetic markers and reconstructing the phylogeny of an important medicinal resource plant, Artemisia selengensis, based on chloroplast genomics. PLoS ONE. 2019;14(2):e0211340.

65. Yang Z, Nielsen R. Synonymous and nonsynonymous rate variation in nuclear genes of mammals. J Mol Evol. 1998;46(4):409–18.

66. Huang H, Shi C, Liu Y, Mao SY, Gao LZ. Thirteen Camelliachloroplast genome sequences determined by high-throughput sequencing: genome structure and phylogenetic relationships. Bmc Evol Biol. 2014;14:151.

67. Zhang F, Li W, Gao CW, Zhang D, Gao LZ. Deciphering tea tree chloroplast and mitochondrial genomes of Camellia sinensis var. assamica. Sci Data. 2019;6:209.

Publisher’s NoteSpringer Nature remains neutral with regard to jurisdictional claims in pub-lished maps and institutional affiliations.


Recommended