+ All Categories
Home > Documents > Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access...

Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access...

Date post: 05-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
13
0 10 20 30 40 50 60 0.2 0.4 0.6 0.8 Caudatum Durra Guinea Kafir Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production Thurber et al. Thurber et al. Genome Biology 2013, 14:R68 http://genomebiology.com/2013/14/6/R68 (26 June 2013)
Transcript
Page 1: Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production

0 10 20 30 40 50 60

0.2

0.4

0.6

0.8

CaudatumDurraGuinea Kafir

Retrospective genomic analysis of sorghumadaptation to temperate-zone grain productionThurber et al.

Thurber et al. Genome Biology 2013, 14:R68http://genomebiology.com/2013/14/6/R68 (26 June 2013)

Page 2: Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production

RESEARCH Open Access

Retrospective genomic analysis of sorghumadaptation to temperate-zone grain productionCarrie S Thurber1, Justin M Ma1, Race H Higgins1,2 and Patrick J Brown1,2*

Abstract

Background: Sorghum is a tropical C4 cereal that recently adapted to temperate latitudes and mechanized grainharvest through selection for dwarfism and photoperiod-insensitivity. Quantitative trait loci for these traits havebeen introgressed from a dwarf temperate donor into hundreds of diverse sorghum landraces to yield theSorghum Conversion lines. Here, we report the first comprehensive genomic analysis of the molecular changesunderlying this adaptation.

Results: We apply genotyping-by-sequencing to 1,160 Sorghum Conversion lines and their exotic progenitors, andmap donor introgressions in each Sorghum Conversion line. Many Sorghum Conversion lines carry unexpectedhaplotypes not found in either presumed parent. Genome-wide mapping of introgression frequencies reveals threegenomic regions necessary for temperate adaptation across all Sorghum Conversion lines, containing the Dw1,Dw2, and Dw3 loci on chromosomes 9, 6, and 7 respectively. Association mapping of plant height and floweringtime in Sorghum Conversion lines detects significant associations in the Dw1 but not the Dw2 or Dw3 regions.Subpopulation-specific introgression mapping suggests that chromosome 6 contains at least four loci required fortemperate adaptation in different sorghum genetic backgrounds. The Dw1 region fractionates into separatequantitative trait loci for plant height and flowering time.

Conclusions: Generating Sorghum Conversion lines has been accompanied by substantial unintended gene flow.Sorghum adaptation to temperate-zone grain production involves a small number of genomic regions, eachcontaining multiple linked loci for plant height and flowering time. Further characterization of these loci willaccelerate the adaptation of sorghum and related grasses to new production systems for food and fuel.

Keywords: Genotyping-by-sequencing, introgression, photoperiod, flowering time, dwarfism

BackgroundCereals have been selected by humans for thousands ofyears, first during their domestication from wild grassesand subsequently for increased yield, uniformity, andadaptation to new environments and management prac-tices [1-3]. Specific molecular pathways have recentlyproven useful for cereal adaptation to modern, high-inputagriculture. For example, the Green Revolution exploitedallelic variation in the gibberellin pathway in wheat andrice to produce semi-dwarf cultivars with increasedharvest index and improved resistance to lodging [4-7].Similar phenotypic changes occurred during the creationof dwarf grain sorghum suitable for mechanized harvest

at temperate latitudes. Understanding the genetic controlof these changes is critical for the efficient transfer ofuseful alleles, both between tropical and temperate grow-ing regions and between breeding programs for differentend uses.Sorghum is the fifth most important cereal crop world-

wide [8] and is widely grown in temperate regions, butwas domesticated in the African tropics [9]. Temperateadaptation for grain production in sorghum requiresphotoperiod-insensitivity, for early maturity, and dwarf-ism, both of which involve at least four major loci [10].Of the major maturity loci (Ma1-Ma6), Ma1 has beenidentified as PRR37 [11] and Ma3 as Phytochrome B[12]. Of the major dwarfing loci (Dw1-Dw4), Dw3 hasbeen identified as PGP1/PGP19, an auxin transporterorthologous to maize brachytic2 [13]. Dw2 and Dw1 are* Correspondence: [email protected]

1Energy Biosciences Institute, University of Illinois, Urbana, IL, USAFull list of author information is available at the end of the article

Thurber et al. Genome Biology 2013, 14:R68http://genomebiology.com/2013/14/6/R68

© 2013 Thurber et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative CommonsAttribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction inany medium, provided the original work is properly cited.

Page 3: Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production

uncloned, with the former closely-linked to Ma1 [14] andthe latter mapping to chromosome 9 [15,16].The oligogenic control of these important agronomic

traits in sorghum was exploited through a backcrossbreeding scheme known as the Sorghum Conversion Pro-gram (SCP) [17]. Mutations for photoperiod-sensitivityand dwarfism had previously arisen spontaneously intemperate regions of Africa, Asia, and the southern US,and were already being used for grain sorghum produc-tion. However, the genetic base of US grain sorghumremained very narrow. During the SCP, genomic regionsconferring early maturity and dwarfing were introgressedfrom an elite donor into approximately 800 exotic sor-ghum accessions representing the breadth of geneticdiversity in sorghum. The resulting SC lines are closelyrelated to their Exotic Progenitor (EP) lines, but differdramatically in plant height and flowering time due tothe presence of donor introgressions (Figure 1A). Theelite donor, BTx406, carries recessive alleles for photo-period-insensitivity and dwarfism at Ma1 and Dw1-Dw3,respectively [17], so these loci are expected to show ahigh frequency of donor introgression in SC lines. Kleinet al. [14] previously mapped introgressions on chromo-some 6 in a subset of SC lines and showed that several ofthem contain vast introgressed tracts around the linkedMa1-Dw2 loci. However, the genetic architecture of tem-perate adaptation in the SC lines (the number and link-age of loci as well as their frequencies in differentsubpopulations) has not been systematically studied on agenome-wide basis. This information can be used both toidentify the underlying targets of the SCP and to helpguide more efficient, marker-directed conversion of exo-tic sorghums to temperate-adapted varieties.In this study, we use genotyping-by-sequencing (GBS)

[18,19] to generate genome-wide single nucleotide poly-morphism (SNP) data for 580 pairs of EP and SC lines,for a total of 1,160 sorghum inbreds. We then employ anovel introgression mapping approach to identify locirequired for temperate adaptation, and validate ourresults using both phenotype-genotype association andpopulation differentiation (Fst) analyses.

Results and discussionGenotyping-by-sequencing of SC lines and theirexotic progenitorsTo map elite donor introgressions in SC lines, we geno-typed 580 pairs of SC and their corresponding EP lines(Additional File 1) at 54,034 SNPs using GBS. Briefly, weconstructed reduced-representation DNA libraries usingpairs of restriction enzymes [18], sequenced them in96-plexes on the Illumina HiSeq, and processed the datausing the TASSEL GBS pipeline [20]. We found that com-bining two separate double digests nearly doubledthe number of SNPs called per sample (Additional File 2).

The full dataset contained 0.3% heterozygous genotypes.Partial imputation using the TASSEL GBS pipelinereduced the proportion of missing genotypes from 66% to23%.Three different seed sources of the elite donor line,

BTx406, were used to construct 28 different genomiclibraries. Three of these libraries originating from a singleseed source of BTx406 showed low concordance and wereremoved from subsequent analyses (Additional File 3).This low concordance was likely due to laboratory error asit was confined to libraries prepared on a single day. Theremaining 25 libraries from the elite donor containedclear, homozygous majority calls for 53,037 SNPs. Theelimination of approximately 7,000 SNPs in complete link-age disequilibrium with another SNP less than 64 base-pairs (bp) away resulted in a dataset of 46,137 SNPs forcalling introgressions.Each trio of homozygous genotypes for a given SNP

across a SC line, its corresponding EP line, and the elitedonor has four possible outcomes (Figure 1B), most com-mon of which is a lack of polymorphism. Of the threeremaining polymorphic combinations, shared genotypesbetween a SC line and its EP line provide evidence thatintrogression has not occurred, whereas shared genotypesbetween a SC line and the donor provide evidence thatintrogression has occurred. The fourth possibility isunexpected: a SC line has a genotype not found in eitherof its parents. Unexpected genotypes could result fromlaboratory error (mix-up or cross-contamination of seedor DNA samples in our laboratory), historical error dur-ing the SCP (pollen contamination or error in pedigreerecords), or uncharacterized heterozygosity and/orgenetic drift during the maintenance of the EP, SC, ordonor lines. We used the proportion of unexpected geno-types as a quality-control filter to prune both markersand individuals. First, we discarded 75 markers with>20% unexpected genotypes, of which 55 were on chro-mosome 6 and 44 were found between 30 Mb and 43 Mbon chromosome 6, a region that includes Ma1 and likelyincludes Dw2 [14]. A possible explanation for the highproportion of unexpected genotypes in this region is thatcertain sources of the elite donor BTx406 used duringthe SCP differed from our BTx406 consensus genotypein this region. In support of this hypothesis, we note thatthe seed source of BTx406 derived from Lubbock, TX,very close to where the SCP was carried out, is heterozy-gous for many of the markers on chromosome 6 thatwere discarded due to having >20% unexpected geno-types. Second, we discarded 190 SC-EP pairs with >10%unexpected genotypes. The distribution of unexpectedgenotypes in some SC lines is clustered (for example,SC1104; Additional File 4), suggesting that genomic seg-ments from a temperate donor other than BTx406 wereintrogressed. In other SC lines the unexpected genotypes

Thurber et al. Genome Biology 2013, 14:R68http://genomebiology.com/2013/14/6/R68

Page 2 of 12

Page 4: Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production

are scattered, suggesting that genetic drift may haveoccurred between the EP line that was used as a recur-rent parent and the EP line that was genotyped. For the16 SC-EP pairs that have >33% unexpected genotypes, aclerical error of some kind - during transcription of pedi-gree records, seed packet labels, or DNA plates - is mostlikely. For the remainder of our analysis, we retained aset of 390 SC-EP pairs with <10% unexpected genotypes(Figure 1C), genotyped at 46,062 markers (AdditionalFile 5).

Inferring elite donor introgressions in SC linesIntrogression maps were generated for each SC line(Figure 1E; Additional File 4). The long-range linkagedisequilibrium in the SC lines was exploited to mapunanchored contigs in the sorghum genome (AdditionalFile 6). After setting non-polymorphic and unexpected

genotypes as missing, missing data were inferred usingflanking markers (Figure 1D). Introgression frequencywas then calculated for each marker as the proportionof the 390 SC lines carrying a BTx406 introgression.The theoretical expectation of introgression frequencyafter four backcrosses in the absence of selection isroughly 3%. The standard deviation of this value in indi-vidual SC lines, in a species with 10 chromosomes and amap length of roughly 16 Morgans, is also roughly 3%[21], so that the introgression frequency in a sample of390 SC lines is expected to range from 2% to 4% in theabsence of selection. Because our dataset contains a sub-stantial proportion of missing data, introgressions thatare very small and very rare may be missed entirely.However, we find that every chromosome containsregions with introgression frequencies >4%, indicatinglinkage to a target of selection during the SCP.

Figure 1 Molecular analysis of the SC Program. (A) Backcrossing scheme used to create SC lines from EP lines and an elite donor. Four generationsof backcrossing were completed, with selection during each F2 generation for short, photoperiod-insensitive plants. (B) Interpretation of moleculardata from donor, SC, and EP lines. SC alleles shared with either the donor or EP lines indicate that introgression has occurred (orange) or not occurred(blue), respectively. SC alleles not found in either parent are unexpected (purple) and were treated as missing data. (C) Genome content of 580 SClines. Each vertical bar represents a single SC line. Bars are ordered by the percentage of unexpected genotypes. The solid black vertical line indicates acutoff of 10% unexpected genotypes. (D) Missing and unexpected introgression scores (question marks) were assigned values based on the mean ofeach flanking marker weighted by its physical distance. (E) A representative example of the introgression maps created for each SC line. The 10sorghum chromosomes are shown from left to right. The 11th column displays unanchored contigs in the sorghum genome. Long-range linkagedisequilibrium in SC lines was exploited to place these contigs on the sorghum physical map.

Thurber et al. Genome Biology 2013, 14:R68http://genomebiology.com/2013/14/6/R68

Page 3 of 12

Page 5: Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production

Three genomic regions are associated with temperateadaptation in sorghumThree regions of the sorghum genome show pro-nounced peaks in introgression frequency in the SClines (Figure 2; top panel), suggesting that these regionsare nearly indispensable for adaptation to temperategrain production. We then used two methods to validate

the introgression mapping results. First, we assessedfunctional variation for plant height and flowering timein SC lines by performing association mapping for thesetraits in the complete set of 580 genotyped SC lines(Figure 2; middle panel). EP lines were not includedbecause most do not flower at temperate latitudes. Sig-nificant phenotypic associations were found in the Dw1

Figure 2 Genome-wide analysis of temperate adaptation in sorghum. The x axis in each panel represents physical distance along the tensorghum chromosomes. The top panel shows introgression frequency in a set of 390 SC-EP pairs with <10% unexpected genotypes (seeMethods for calculation). The middle panel shows phenotypic associations with plant height and flowering time in the full set of 580 genotypedSC lines. The bottom panel shows population differentiation (Fst) between the full sets of 580 SC lines and 580 EP lines.

Thurber et al. Genome Biology 2013, 14:R68http://genomebiology.com/2013/14/6/R68

Page 4 of 12

Page 6: Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production

but not the Dw2 or Dw3 genomic regions. Second, toensure that the introgression mapping results were notunduly affected by unexpected genotypes, we calculatedFst between the complete sets of 580 genotyped SC linesand 580 EP lines and found that regions of high Fst mir-ror the regions of high introgression frequency almostexactly (Figure 2; bottom panel). Unlike introgressionfrequency, Fst makes no assumptions about the pedi-grees of the SC lines.

The cloned Dw3 locus on chromosome 7 is taggedusing three different methodsChromosome 7, which contains the known, cloned targetDw3 at 58.6 Mb, has a peak introgression frequency at58.7 Mb, a peak Fst at 58.6 Mb, and a peak plant heightassociation at 58.2 Mb that is not quite significant atP <0.05 following a Bonferonni correction (Figure 3).Since the causal mutation in Dw3 is a copy number var-iant (CNV) that is unstable and may have arisen quiterecently [13], our dataset may not contain linked SNPs inhigh linkage disequilibrium with the causal CNV. Severalregions on either side of the Dw3 locus show local peaksin both introgression frequency and Fst, and co-localizewith weak signals of flowering time association.

The Dw1 region on chromosome 9 fractionates intolinked QTLChromosome 9, which contains the uncloned Dw1 locus,has a peak introgression frequency at 57.6 Mb, a peak Fstat 57.4 Mb, and a peak plant height association at 57.5Mb, in close agreement with previous results (Figure 4)[15,16]. A separate cluster of SNPs in the Dw1 regionassociates with flowering time, with a peak at 59.6 Mb.The most significant SNPs for plant height and floweringtime are not in significant linkage disequilibrium witheach other (r2 = 0.15) and align with two distinct peaks inboth introgression frequency and Fst, strongly suggestingthat the Dw1 region contains separate loci for plant heightand flowering time.

Chromosome 6 retains little functional variation inSC linesChromosome 6 displays a high introgression frequencyand high Fst across most of its length, even though theknown targets on this chromosome are tightly linked:Ma1 at 40.3 Mb, and the uncloned Dw2 locus several Mbaway (Figure 5). The peak introgression frequency andpeak Fst on chromosome 6 apparently correspond toDw2 and not Ma1 (Additional File 7), possibly becauseseveral independent recessive ma1 alleles already existin the EP lines (R. Klein, personal communication).The choppiness of the introgression frequency between30 Mb and 43 Mb correlates with a very high proportionof unexpected genotypes in this region, which could

result from the existence of an additional, uncharacter-ized ma1-dw2 haplotype in the elite donor. There are nosignificant phenotypic associations on chromosome 6,suggesting that elite donor introgressions have removedmost functional variation for plant height and floweringtime on this chromosome in SC lines. Consistent withprevious studies reporting a limited number of chromo-some 6 haplotypes in SC lines [14,15], we observe themaintenance of high introgression frequency across mostof the chromosome, which could be attributed to either alarge number of targeted loci or to limited recombinationbetween a few targets. Targets could result from directselection for plant height and flowering time and/orindirect selection for vigor and adaptation to climatic andsoil variation. Regardless of the biological explanation,decreased variation on chromosome 6 is a concern fortemperate sorghum breeding. Of the 35 major-effectgenes mapped in sorghum as of 2010 [22], seven map tochromosome 6 and four (d, gc, P, Rs1) have been asso-ciated with resistance to biotic stresses including ergot,grain mold, and shoot fly [23-25]. Exotic alleles at theseand other unidentified linked loci are at low frequency inSC lines, yet may be useful in future breeding efforts.

Identification of subpopulation-specific introgressiontargetsSorghum is a crop with strong population sub-divisionand apparently multiple domestication events [26].Therefore, we calculated introgression frequencies sepa-rately in three subpopulations corresponding to the cau-datum (C; n = 137), durra (D; n = 131), and guinea/kafir(GK; n = 122) racial groups. Subpopulations were definedbased on genetic criteria in the EP lines (see Methods),which closely match traditional morphological classifica-tion (Figure 6). Similar results were obtained when sub-populations are defined based on genetic criteria in theSC lines with or without the three major introgressionregions included (Additional File 8). The significance ofintrogression frequency differences between subpopula-tions was assessed using permutation (see Methods). Weidentified multiple subpopulation-specific introgressiontargets on every chromosome (Additional File 9). Mostdramatically, a target at approximately 1 Mb on chromo-some 6 is specific to the GK group. In addition to thelinked Ma1-Dw2 loci and this GK-specific locus, the pre-sence of at least one additional locus on chromosome 6is necessary to explain the maintenance of high introgres-sion frequency across the chromosome in SC lines ofcaudatum and durra origin. Introgression frequencies inregions linked to both Dw1 and Dw3 also vary signifi-cantly by subpopulation. Although differences in recom-bination between subpopulations could theoreticallyaccount for such differences, several of these regions alsocontain phenotypic associations with plant height and

Thurber et al. Genome Biology 2013, 14:R68http://genomebiology.com/2013/14/6/R68

Page 5 of 12

Page 7: Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production

flowering time in SC lines, suggesting that they resultfrom subpopulation-specific targets of the SCP. Similarly,a phenotypic association with flowering time at 41.9 Mbon chromosome 5 overlaps with a GK-specific introgres-sion peak (Figure 2, Additional Files 9 and 10). Addi-tional subpopulation-specific targets in regions unlinkedto Dw1, Dw2, and Dw3 that do not overlap with signifi-cant phenotypic associations could contain loci for otheragronomic traits selected for during the conversion

process, including disease resistance, lack of seed dor-mancy, and overall vigor under temperate conditions.

ConclusionsThe molecular analysis of parents and progeny providesthe opportunity for pedigree verification. Our resultsshow that almost one-third of SC lines contain a sub-stantial proportion of unexpected genotypes (>10% ofinformative markers). We used three complementary

Figure 3 Introgression frequency, phenotypic associations, and population differentiation in the Dw3 region on sorghumchromosome 7. Panels are the same as in Figure 2. The location of Dw3 at 58.6 Mb is shown with a vertical dashed gray line.

Thurber et al. Genome Biology 2013, 14:R68http://genomebiology.com/2013/14/6/R68

Page 6 of 12

Page 8: Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production

approaches - introgression mapping, association mapping,and population differentiation (Fst) - to characterize thegenetic architecture of adaptation to temperate-zone grainproduction in sorghum. Our novel introgression mappingstrategy exploited recombination and selection previouslyimposed by plant breeders to map three major genomicregions, one of which no longer harbors functional varia-tion in temperate-adapted SC lines. Association mapping

confirmed that the Dw1 region contains separate QTL forplant height and flowering time. Significant differences inintrogression frequency between subpopulations stronglysuggest the existence of additional uncharacterized locithat affect plant height and flowering time in sorghum.Linkage disequilibrium between at least four targeted

loci on chromosome 6 has led to the introgression of asingle elite haplotype across most of this chromosome in

Figure 4 Introgression frequency, phenotypic associations, and population differentiation in the Dw1 region on sorghumchromosome 9. Panels are the same as in Figures 2 and 3. The locations of putative QTL for plant height and maturity are shown with verticaldashed gray lines.

Thurber et al. Genome Biology 2013, 14:R68http://genomebiology.com/2013/14/6/R68

Page 7 of 12

Page 9: Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production

the majority of lines examined. Chromosome 6 containsroughly 10% of sorghum genes, for which very little func-tional diversity has been exploited for temperate sorghumbreeding. This lack of diversity undoubtedly limits adap-tive potential, especially for complex traits includingresistance to abiotic and/or biotic stress. Increasing geneflow and recombination between tropical and temperatesorghum varieties and haplotypes will help unlock the

genetic potential of this stress-tolerant crop to meet ourrising demand for food, feed, and fuel in an era ofincreasing climatic volatility.

MethodsPlant materials, DNA extraction, and quantificationSeed for SC lines was obtained from the USDA-ARSCropping Systems Research Laboratory (Lubbock, TX,

Figure 5 Introgression frequency, phenotypic associations, and population differentiation on sorghum chromosome 6. Panels are thesame as in Figures 2 to 4. The location of Ma1 at 40.3 Mb is shown with a vertical dashed gray line.

Thurber et al. Genome Biology 2013, 14:R68http://genomebiology.com/2013/14/6/R68

Page 8 of 12

Page 10: Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production

USA) and seed for EP lines was obtained from theNational Plant Germplasm System (NPGS [27]). Infor-mation on the geographic origins and morphologicalracial classification of each SC line were obtained fromTexas A&M University (Additional File 1). Three inde-pendent seed sources of the elite donor BTx406 wereobtained from the NPGS (PI 656020), the USDA-Crop-ping Systems Research Laboratory, and Texas A&MUniversity. Genomic DNA was extracted from etiolatedseedlings approximately 3 days after germination using amodified CTAB protocol [28] and quantified using Pico-Green (Invitrogen, NY, USA).

SNP library creationLibraries were prepared using a protocol modified fromPoland et al. 2012 [18]. Genomic DNA (approximately250 ng) was double digested with either PstI-HF and BfaIor PstI-HF and HinP1I at 37°C for 2 h with heat inactiva-tion at 80°C for 20 min. Digested DNA was ligated totwo separate adapters using T4 ligase with 1mM ATP.The first adapter contains the Illumina forward sequen-cing primer, one of 96 unique barcodes, and the PstI

overhang. The second adapter contains the Illuminareverse sequencing primer and the overhang for eitherBfaI or HinP1I. The full list of adapters is shown in Addi-tional File 11. Ligation reactions were held at 25°C for 2 hfollowed by heat inactivation at 65°C for 20 min. PooledDNA from 96 barcoded libraries was cleaned using a 2:1ratio of AmpureXP Beads (Beckman Coulter, CA, USA)to DNA solution using a Magnetic Particle Concentrator(Invitrogen, NY, USA) with two washes in 95% ethanoland resuspension in elution buffer (EB; 10mM Tris).Cleaned DNA pools were amplified using Illumina pri-mers in a 2X PhusionHF Master Mix (New England Bio-labs, MA, USA) with cycler conditions as follows: 98°C30 s, 15 cycles (98°C 10 s, 68°C 30 s, 72°C 30 s), 72°C5 min. Samples were run on agarose gels to confirm thepresence of a genomic smear and cleaned a second timewith AMPure beads. Amplified DNA sizes and relativeconcentrations were assessed using an Agilent Bioanaly-zer 2100 and Agilent DNA1000 Kit (Agilent Technolo-gies Inc., CA, USA) and PicoGreen. The two separatelydigested samples were combined in equimolar concentra-tions and diluted to 10 nM in library buffer (EB + 0.05%

Figure 6 Sorghum racial identity and subpopulation structure. PCA plot of the 580 exotic progenitor (EP) lines genotyped in this study.Each dot represents an EP line, colored according to its morphologically-defined race. Larger circles and smaller triangles represent EP lines withmore and fewer than 10% unexpected genotypes, respectively. The grey dashed lines indicate the criteria used to assign EP lines to geneticgroups for subpopulation-specific introgression mapping.

Thurber et al. Genome Biology 2013, 14:R68http://genomebiology.com/2013/14/6/R68

Page 9 of 12

Page 11: Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production

Tween-20) and submitted to the W.M. Keck Center atthe University of Illinois for single-end sequencing on theIllumina HiSeq2000. The Keck Center performed anadditional qPCR assay on each library to adjust concen-trations before sequencing.

Genotype data analysisSNPs were called from Illumina fastq files using the TAS-SEL GBS pipeline [20]. Only 64 bp tags present at least 10times in the dataset were considered. Alignment was per-formed using BWA [29] with the default settings. Inbredlines and SNPs with >95% missing data were discarded.SNPs were not filtered by minor allele frequency, as rareSNPs are especially useful for inferring introgressionevents between pairs of lines (Figure 1B). Heterozygousgenotypes accounted for 0.3% of the total dataset. Partialimputation using the TASSEL GBS pipeline reduced theproportion of missing data from approximately 66% toapproximately 20%. For the association and Fst analyses,the remaining missing data were imputed using BEAGLE.This yielded substantially fewer unexpected genotypesthan direct imputation using BEAGLE without prior par-tial imputation (data not shown).

Mapping unanchored contigs in the sorghum genomeWe defined a set of 213 SNPs from 31 unanchored contigsthat had at least 20 introgression calls and an introgressionfrequency of at least 10%, and calculated linkage disequili-brium (r2) between introgression scores in the 213 unan-chored SNPs and our complete set of 46,062 SNPs withintrogression scores in the 390 SC-EP pairs that wereplaced on the sorghum physical map (V1.0 [30]). Most(181) of the unanchored SNPs mapped uniquely to a sin-gle chromosome, with a mean of 8.4 mapped SNPs tiedfor the highest r2 across a mean physical distance of9.1 Mb (Additional File 6).

Calculation of introgression scores and frequenciesFor each SNP, an introgression was scored as either pre-sent (1), when a genotype was shared between the SC lineand the donor line, or absent (0), when a genotype wasshared between the SC line and its EP line. Missing datafor presence/ absence of introgressions were inferred asthe mean of each flanking marker weighted by its physicaldistance (Figure 1D). Missing data proximal and distal tothe first and last informative markers on a chromosome,respectively, were assigned the value of the closest infor-mative marker. Once missing data were imputed, intro-gression frequencies were calculated at each SNP as thepercentage of SC lines with an introgression.

Subpopulation assignment and permutationsPrincipal component analysis (PCA) was performed in EPlines in R [31] using the prcomp() function and a dataset

of 22,203 SNPs with minor allele frequencies >10% in theset of 1,160 SC and EP lines (580 pairs). EP lines wereassigned to subpopulations using values for PC1 and PC2as follows: (1) lines with PC2 <-20 were assigned to theguinea/kafir (GK) group; (2) lines with PC2 >-20 andPC1 >0 were assigned to the caudatum (C) group;(3) lines with PC2 >-20 and PC1 <0 were assigned to thedurra (D) group. Introgressed regions excluded from theanalysis in Additional File 8 were defined as locations<55 Mb on chromosome 6, >50 Mb on chromosome 7,and >50 Mb on chromosome 9. Significance of subpopu-lation differences in introgression frequency was assessedby randomly assigning SC lines to subpopulationsof equivalent size (137, 131, and 122 individuals) andcalculating introgression frequencies across the threepermuted subpopulations. For each permutation, themaximum range of introgression frequencies across thethree subpopulations was recorded for each chromo-some. Two hundred permutations were performed and awas set to 0.05.

Phenotypic data and association mappingThe 580 genotyped SC lines were grown in 6 m plotswith 0.76 m row spacing in Urbana, IL in the summersof 2011 and 2012 and phenotyped for plant height andflowering time. Plant height was measured as the dis-tance (cm) from the ground to the penultimate or ‘pre-flag’ leaf on one representative plant per row. Flower-ing time was measured as the time (days from planting)at which 50% of the plants in the row had initiatedanthesis. Phenotypic data from each year were normal-ized and the mean normalized value across all yearswas used for association mapping. The GAPIT packagein R [32] was used to conduct marker-trait associationsusing the default parameters. Markers included allSNPs discovered in this study with minor allele fre-quencies ≥10%. Missing SNP data were imputed usingBEAGLE.

Data availabilityRaw genotyping-by-sequencing read data have beendeposited in the Sequenced Read Archive [SRA:SRP022956]. Introgression scores have been included ina table as Additional File 12.

Additional material

Additional File 1: Table S1. SC and EP lines used in this study.Principal components analysis in the EP lines was used to assign SC-EPpairs to subpopulations. Plant height and flowering time phenotypesused for association mapping in the SC lines are also provided.

Additional File 2: Figure S1. Enzyme effects on SNP output.Combining two double digests (PstI-HF/HinP1I and PstI-HF/BfaI) nearlydoubles the number of SNPs called per sample over one double digest(Pst1-HF/HinP1I).

Thurber et al. Genome Biology 2013, 14:R68http://genomebiology.com/2013/14/6/R68

Page 10 of 12

Page 12: Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production

Additional File 3: Figure S2. Principal Component Analysis (PCA) ofBTx406 seed source libraries. Twenty-eight libraries were created forBTx406 seed from three different sources (GRIN, Cornell, and Lubbock).The three outlier libraries from the GRIN collection were removed due tolow concordance.

Additional File 4: Figure S3. Introgression maps for 390 SC lines.

Additional File 5: Table S2. Number and percentage of introgressed,unexpected, and informative markers for each SC-EP pair.

Additional File 6: Table S3. Physical map positions of unanchoredSNPs

Additional File 7: Figure S4. Introgression frequency, phenotypicassociations, and population differentiation in the Ma1-Dw2 region onsorghum chromosome 6. Panels are the same as in Figures 3 to 6. Thelocations of Ma1 at 40.3 Mb is shown with a vertical dashed gray line.

Additional File 8: Figure S5. PCA of SC lines with and without SNPs inthe three major introgressed regions.

Additional File 9: Figure S6. Subpopulation-specific introgressionfrequencies.

Additional File 10: Table S4. Phenotypic associations with plant heightand flowering time in 580 SC lines.

Additional File 11: Table S5. List of barcoded adapters used in librarypreparation.

Additional File 12: Table S6. Raw introgression scores.

AbbreviationsEP: exotic progenitor; GBS: genotyping-by-sequencing; SC: sorghumconversion; SNP: single nucleotide polymorphism.

Competing interestsThe authors declare that they have no competing interests.

Authors’ contributionsPB conceived the project idea. CT, JM, RH, and PB performed data collectionand analysis. PB and CT wrote the manuscript.

AcknowledgementsThis project was supported by startup funding from the Energy BiosciencesInstitute to PB. Charlie Woodfin (USDA-ARS Lubbock, retired) and Bill Rooney(TAMU) provided seed. We also wish to acknowledge the stellar technicalsupport provided by Alvaro Hernandez and staff at the University of Illinois’Keck Center.

Authors’ details1Energy Biosciences Institute, University of Illinois, Urbana, IL, USA.2Department of Crop Sciences, University of Illinois, Urbana, IL, USA.

Received: 28 March 2013 Revised: 11 June 2013Accepted: 26 June 2013 Published: 26 June 2013

References1. Gepts P: Crop domestication as a long-term selection experiment. Plant

Breeding Reviews 2004, 24:1-44.2. Harlan JR, De Wet JMJ, Price EG: Comparative evolution of cereals.

Evolution 1973, 27:311-325.3. Purugganan MD, Fuller DQ: The nature of selection during plant

domestication. Nature 2009, 457:843-848.4. Khush GS: Green revolution: preparing for the 21st century. Genome

1999, 42:646-655.5. Peng J, Richards DE, Hartley NM, Murphy GP, Devos KM, Flintham JE,

Beales J, Fish LJ, Worland AJ, Pelica F: “Green revolution” genesencode mutant gibberellin response modulators. Nature 1999,400:256-260.

6. Ashikari M, Sasaki A, Ueguchi-Tanaka M, Itoh H, Nishimura A, Datta S,Ishiyama K, Saito T, Kobayashi M, Khush GS: Loss-of-function of a rice

gibberellin biosynthetic gene, GA20 oxidase (GA20ox-2), led to the rice‘green revolution’. Breeding Science 2002, 52:143-150.

7. Hedden P: The genes of the Green Revolution. Trends Genet 2003, 19:5-9.8. Food and Agriculture Organization of the United Nations. [http://www.

fao.org/index_en.htm].9. Smith CW, Frederiksen RA: Sorghum: Origin, History, Technology, and

Production Hoboken, NJ: John Wiley & Sons; 2000.10. Quinby JR: Sorghum improvement and the genetics of growth College

Station, TX: Texas A&M University Press; 1974.11. Murphy RL, Klein RR, Morishige DT, Brady JA, Rooney WL, Miller FR,

Dugas DV, Klein PE, Mullet JE: Coincident light and clock regulation ofpseudoresponse regulator protein 37 (PRR37) controls photoperiodicflowering in sorghum. Proc Natl Acad Sci USA 2011, 108:16469-16474.

12. Childs KL, Miller FR, Cordonnier-Pratt MM, Pratt LH, Morgan PW, Mullet JE:The Sorghum Photoperiod Sensitivity Gene, Ma3, Encodes aPhytochrome B. Plant Physiol 1997, 113:611-619.

13. Multani DS, Briggs SP, Chamberlin MA, Blakeslee JJ, Murphy AS, Johal GS:Loss of an MDR Transporter in Compact Stalks of Maize br2 andSorghum dw3 Mutants. Science 2003, 302:81-84.

14. Klein RR, Mullet JE, Jordan DR, Miller FR, Rooney WL, Menz MA, Franks CD,Klein PE: The effect of tropical sorghum conversion and inbreddevelopment on genome diversity as revealed by high-resolutiongenotyping. Crop Science 2008, 48:S-12.

15. Morris GP, Ramu P, Deshpande SP, Hash CT, Shah T, Upadhyaya HD, Riera-Lizarazu O, Brown PJ, Acharya CB, Mitchell SE, Harriman J, Glaubitz JC,Buckler ES, Kresovich S: Population genomic and genome-wideassociation studies of agroclimatic traits in sorghum. Proc Natl Acad SciUSA 2013, 110:453-458.

16. Brown PJ, Rooney WL, Franks C, Kresovich S: Efficient mapping of plantheight quantitative trait loci in a sorghum association population withintrogressed dwarfing genes. Genetics 2008, 180:629-637.

17. Stephens JC, Miller FR, Rosenow DT: Conversion of alien sorghums toearly combine genotypes. Crop Science 1967, 7:396-396.

18. Poland JA, Brown PJ, Sorrells ME, Jannink J-L: Development of high-density genetic maps for barley and wheat using a novel two-enzymegenotyping-by-sequencing approach. PLoS ONE 2012, 7:e32253.

19. Elshire RJ, Glaubitz JC, Sun Q, Poland JA, Kawamoto K, Buckler ES,Mitchell SE: A robust, simple genotyping-by-sequencing (GBS) approachfor high diversity species. PLoS ONE 2011, 6:e19379.

20. Buckler Lab for Maize Genetics and Diversity. [http://www.maizegenetics.net/].

21. Hill WG: Variation in genetic identity within kinships. Heredity 1993,71:652-653.

22. Mace ES, Jordan DR: Location of major effect genes in sorghum(Sorghum bicolor (L.) Moench). Theor Appl Genet 2010, 121:1339-1356.

23. Parh DK, Jordan DR, Aitken EA, Mace ES, Jun-Ai P, McIntyre CL, Godwin ID:QTL analysis of ergot resistance in sorghum. Theor Appl Genet 2008,117:369-382.

24. Klein RR, Rodriguez-Herrera R, Schlueter JA, Klein PE, Yu ZH, Rooney WL:Identification of genomic regions that affect grain-mould incidence andother traits of agronomic importance in sorghum. Theor Appl Genet 2001,102:307-319.

25. Satish K, Srinivas G, Madhusudhana R, Padmaja PG, Nagaraja Reddy R,Murali Mohan S, Seetharama N: Identification of quantitative trait loci forresistance to shoot fly in sorghum [Sorghum bicolor (L.) Moench]. TheorAppl Genet 2009, 119:1425-1439.

26. Lin Z, Li X, Shannon LM, Yeh C-T, Wang ML, Bai G, Peng Z, Li J, Trick HN,Clemente TE, Doebley J, Schnable PS, Tuinstra MR, Tesso TT, White F, Yu J:Parallel domestication of the Shattering1 genes in cereals. Nat Genet2012, 44:720-724.

27. National Plant Germplasm System. [http://www.ars-grin.gov/npgs/].28. Mace ES, Buhariwalla KK, Buhariwalla HK, Crouch JH: A high-throughput

DNA extraction protocol for tropical molecular breeding programs. PlantMol Biol Rep 2003, 21:459-460.

29. Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009, 25:1754-1760.

30. Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H,Haberer G, Hellsten U, Mitros T, Poliakov A: The Sorghum bicolor genomeand the diversification of grasses. Nature 2009, 457:551-556.

31. The Comprehensive R Archive Network. [http://cran.r-project.org/].

Thurber et al. Genome Biology 2013, 14:R68http://genomebiology.com/2013/14/6/R68

Page 11 of 12

Page 13: Retrospective genomic analysis of sorghum adaptation to … · 2017-08-25 · RESEARCH Open Access Retrospective genomic analysis of sorghum adaptation to temperate-zone grain production

32. Lipka AE, Tian F, Wang Q, Peiffer J, Li M, Bradbury PJ, Gore MA, Buckler ES,Zhang Z: GAPIT: genome association and prediction integrated tool.Bioinformatics 2012, 28:2397-2399.

doi:10.1186/gb-2013-14-6-r68Cite this article as: Thurber et al.: Retrospective genomic analysis ofsorghum adaptation to temperate-zone grain production. GenomeBiology 2013 14:R68.

Submit your next manuscript to BioMed Centraland take full advantage of:

• Convenient online submission

• Thorough peer review

• No space constraints or color figure charges

• Immediate publication on acceptance

• Inclusion in PubMed, CAS, Scopus and Google Scholar

• Research which is freely available for redistribution

Submit your manuscript at www.biomedcentral.com/submit

Thurber et al. Genome Biology 2013, 14:R68http://genomebiology.com/2013/14/6/R68

Page 12 of 12


Recommended