+ All Categories
Home > Documents > Genomic Prediction Within and Among Doubled-Haploid ... · | GENOMIC PREDICTION Genomic Prediction...

Genomic Prediction Within and Among Doubled-Haploid ... · | GENOMIC PREDICTION Genomic Prediction...

Date post: 06-Aug-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
12
| GENOMIC PREDICTION Genomic Prediction Within and Among Doubled-Haploid Libraries from Maize Landraces Pedro C. Brauner,* Dominik Müller,* Pascal Schopp,* Juliane Böhm,* Eva Bauer, Chris-Carolin Schön, and Albrecht E. Melchinger* ,1 *Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70593 Stuttgart, Germany and Plant Breeding, TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany ORCID ID: 0000-0002-4820-2846 (E.B.) ABSTRACT Thousands of maize landraces are stored in seed banks worldwide. Doubled-haploid libraries (DHL) produced from landraces harness their rich genetic diversity for future breeding. We investigated the prospects of genomic prediction (GP) for line per se performance in DHL from six European landraces and 53 elite int (EF) lines by comparing four scenarios: GP within a single library (sL); GP between pairs of libraries (LwL); and GP among combined libraries, either including (cLi) or excluding (cLe) lines from the training set (TS) that belong to the same DHL as the prediction set. For scenario sL, with N = 50 lines in the TS, the prediction accuracy (r) among seven agronomic traits varied from 20.53 to 0.57 for the DHL and reached up to 0.74 for the EF lines. For LwL, r was close to zero for all DHL and traits. Whereas scenario cLi showed improved r values compared to sL, r for cLe remained at the low level observed for LwL. Forecasting r with deterministic equations yielded inated values compared to empirical estimates of r for the DHL, but conserved the ranking. In conclusion, GP is promising within DHL, but large TS sizes (N . 100) are needed to achieve decent prediction accuracy because LD between QTL and markers is the primary source of information that can be exploited by GP. Since production of DHL from landraces is expensive, we recommend GP only for very large DHL produced from a few highly preselected landraces. KEYWORDS Genomic Prediction; doubled-haploid; maize landraces; GenPred; shared data resources G enetic diversity is fundamental for selection progress. In plant breeding, high selection pressure and the usage of few key ancestors in the development of new germplasm contributed to a strong decline of the genetic diversity present in elite germplasm (Messmer et al. 1992; Reif et al. 2005a; Technow et al. 2013). Recourse to the rich cultural heritage of landraces stored in seed banks worldwide is considered a promising strategy to broaden genetic diversity (Salhuana and Pollak 2006; Warburton et al. 2008; Strigens et al. 2013). Landraces represent an attractive source of diversity because they are the predecessors of modern cultivars and have been cultivated by farmers for centuries. During the last century, modern varieties have replaced nearly all landraces, many of which have been conserved in seed banks. However, a major obstacle for using landraces in breeding is the per- formance gap to modern hybrid cultivars, which continu- ously widens due to ongoing selection progress. Therefore, novel approaches are urgently needed to turbochargethe use of landraces as genetic resources in breeding (Yu et al. 2016). In allogamous crops, mining the genetic diversity of seed bank accessions entails two challenges. First, one must iden- tify the most promising accessions, commonly based on pass- port data provided by seed banks and results from eld trials, in which the landraces are evaluated for their per se perfor- mance and/or testcross performance with suitable testers (Salhuana and Pollak 2006; Böhm et al. 2014). Selection among accessions is not sufcient because both molecular and phenotypic data from various crops suggest that more genetic variation lies within than between landraces (Greene et al. 2014; Monteiro et al. 2016; Böhm et al. 2017; Mayer et al. 2017). Therefore, the second challenge is mining the Copyright © 2018 by the Genetics Society of America doi: https://doi.org/10.1534/genetics.118.301286 Manuscript received June 23, 2018; accepted for publication September 24, 2018; published Early Online September 26, 2018. Supplemental material available at Figshare: https://doi.org/10.25386/genetics. 6667481. 1 Corresponding author: Institute of Plant Breeding, Seed Sciences and Population Genetics, University of Hohenheim, Fruwirthstraße 21, 70599 Stuttgart, Germany. E-mail: [email protected] Genetics, Vol. 210, 11851196 December 2018 1185
Transcript
Page 1: Genomic Prediction Within and Among Doubled-Haploid ... · | GENOMIC PREDICTION Genomic Prediction Within and Among Doubled-Haploid Libraries from Maize Landraces Pedro C. Brauner,*

| GENOMIC PREDICTION

Genomic Prediction Within and AmongDoubled-Haploid Libraries from Maize Landraces

Pedro C. Brauner,* Dominik Müller,* Pascal Schopp,* Juliane Böhm,* Eva Bauer,† Chris-Carolin Schön,†

and Albrecht E. Melchinger*,1

*Institute of Plant Breeding, Seed Science and Population Genetics, University of Hohenheim, 70593 Stuttgart, Germanyand †Plant Breeding, TUM School of Life Sciences Weihenstephan, Technical University of Munich, 85354 Freising, Germany

ORCID ID: 0000-0002-4820-2846 (E.B.)

ABSTRACT Thousands of maize landraces are stored in seed banks worldwide. Doubled-haploid libraries (DHL) produced fromlandraces harness their rich genetic diversity for future breeding. We investigated the prospects of genomic prediction (GP) for line perse performance in DHL from six European landraces and 53 elite flint (EF) lines by comparing four scenarios: GP within a single library(sL); GP between pairs of libraries (LwL); and GP among combined libraries, either including (cLi) or excluding (cLe) lines from thetraining set (TS) that belong to the same DHL as the prediction set. For scenario sL, with N = 50 lines in the TS, the prediction accuracy(r) among seven agronomic traits varied from 20.53 to 0.57 for the DHL and reached up to 0.74 for the EF lines. For LwL, r was closeto zero for all DHL and traits. Whereas scenario cLi showed improved r values compared to sL, r for cLe remained at the low levelobserved for LwL. Forecasting r with deterministic equations yielded inflated values compared to empirical estimates of r for the DHL,but conserved the ranking. In conclusion, GP is promising within DHL, but large TS sizes (N . 100) are needed to achieve decentprediction accuracy because LD between QTL and markers is the primary source of information that can be exploited by GP. Sinceproduction of DHL from landraces is expensive, we recommend GP only for very large DHL produced from a few highly preselectedlandraces.

KEYWORDS Genomic Prediction; doubled-haploid; maize landraces; GenPred; shared data resources

Genetic diversity is fundamental for selection progress. Inplant breeding, high selection pressure and the usage of

few key ancestors in the development of new germplasmcontributed to a strong decline of the genetic diversity presentin elite germplasm (Messmer et al. 1992; Reif et al. 2005a;Technow et al. 2013). Recourse to the rich cultural heritageof landraces stored in seed banks worldwide is considered apromising strategy to broaden genetic diversity (Salhuanaand Pollak 2006; Warburton et al. 2008; Strigens et al.2013). Landraces represent an attractive source of diversitybecause they are the predecessors of modern cultivars andhave been cultivated by farmers for centuries. During the last

century, modern varieties have replaced nearly all landraces,many of which have been conserved in seed banks. However,a major obstacle for using landraces in breeding is the per-formance gap to modern hybrid cultivars, which continu-ously widens due to ongoing selection progress. Therefore,novel approaches are urgently needed to “turbocharge” theuse of landraces as genetic resources in breeding (Yu et al.2016).

In allogamous crops, mining the genetic diversity of seedbank accessions entails two challenges. First, one must iden-tify the most promising accessions, commonly based on pass-port data provided by seed banks and results from field trials,in which the landraces are evaluated for their per se perfor-mance and/or testcross performance with suitable testers(Salhuana and Pollak 2006; Böhm et al. 2014). Selectionamong accessions is not sufficient because both molecularand phenotypic data from various crops suggest that moregenetic variation lies within than between landraces (Greeneet al. 2014; Monteiro et al. 2016; Böhm et al. 2017; Mayeret al. 2017). Therefore, the second challenge is mining the

Copyright © 2018 by the Genetics Society of Americadoi: https://doi.org/10.1534/genetics.118.301286Manuscript received June 23, 2018; accepted for publication September 24, 2018;published Early Online September 26, 2018.Supplemental material available at Figshare: https://doi.org/10.25386/genetics.6667481.1Corresponding author: Institute of Plant Breeding, Seed Sciences and PopulationGenetics, University of Hohenheim, Fruwirthstraße 21, 70599 Stuttgart, Germany.E-mail: [email protected]

Genetics, Vol. 210, 1185–1196 December 2018 1185

Page 2: Genomic Prediction Within and Among Doubled-Haploid ... · | GENOMIC PREDICTION Genomic Prediction Within and Among Doubled-Haploid Libraries from Maize Landraces Pedro C. Brauner,*

genetic diversity within landraces, preferably in the form ofinbred lines. Unlike the stored landraces, which representpopulations of heterozygous individuals, inbred lines pro-duced from landraces can be identically multiplied ad libitumand, hence, can be phenotyped with any degree of precisiondesired to characterize the source germplasm used as thedonor of new genetic variation.

Line development has traditionally been accomplishedby recurrent selfing or full-sib matings (Poehlman 1987).When applied to landraces of allogamous species, successrates are generally extremely low, because the high geneticload revealed in advanced selfing generations is manifestedin poor vigor (inbreeding depression), and a loss of inbredlines due to the fixation of detrimental or (sub)lethal al-leles (Böhm et al. 2017). Recently, Melchinger et al. (2017)showed that large-scale production of doubled-haploid (DH)lines from landraces in maize is possible, albeit at muchhigher expenditures compared with elite materials. They alsoshowed that DH libraries (DHL, see Supplemental Material,Table S1 for list of abbreviations) capture the allelic diversityof landraces in an unbiasedway, and recommended the use ofDHL for conservation purposes in seed banks and as sourcematerials for prebreeding programs.

As DHL become available, the breeding potential of the DHlinesmust be assessed prior to their use in breeding programs.A first priority is evaluating their line per se performance,because many of the DH lines from landraces display severeweaknesses such as lodging, tillering, susceptibility to dis-eases, and low pollen production. Phenotyping in multiloca-tion trials entails high costs and requires large seed quantities,yet seed multiplication is generally a problem due to thepoor seed set and reduced vigor of these materials (Strigenset al. 2013; Böhm et al. 2017). Subsequent evaluation fortestcross performance should be restricted to lines showingacceptable per se performance (Wilde et al. 2010).

For elitegermplasm,genomicprediction (GP)has emergedas a powerful tool to complement the expensive phenotypingof test candidates in breeding programs (Crossa et al. 2017).GP consists of training a statistical model in a set of individ-uals containing both phenotypic and genotypic informationto predict the breeding values of individuals with onlygenotypic information. For the autogamous species sorghum(Sorghum bicolor L.), Yu et al. (2016) recently demonstratedthe use of GP with a model trained across landraces to choosepromising accessions from the vast amount of germplasmarchived in seeds banks and expedite the germplasm evalu-ation process. However, the use of GP for mining geneticdiversity within landraces of allogamous species was notinvestigated by this study and, hence, requires furtherresearch.

In animal and plant breeding, statistical models have beendeveloped to perform GP between populations (Lehermeieret al. 2015; Wientjes et al. 2015). Using models trained withdata from cattle breeds with large sample sizes yielded lowaccuracies for estimating the breeding value of individu-als from a different breed with smaller size (Hayes et al.

2009), whereas combining several cattle breeds with smallsample sizes into a larger set resulted in similar or higheraccuracies than prediction within breeds (Pryce et al. 2011;Chen et al. 2014; Iheshiulor et al. 2016). A similar situationexists for DHL, because the number of lines is generally lowdue to limited success rates in the production of DH linesfrom landraces (Melchinger et al. 2017). However, GP fromone landrace to another, and particularly pooling differentlandraces in a combined training set (TS), has not been in-vestigated hitherto.

Here, we use maize (Zea mays L.) as a model to demon-strate how a combination of modern techniques couldsupport mining of the genetic diversity present in landracesof allogamous crops. Our example is from the flint heteroticpool that represents one pillar in the dent 3 flint heteroticpattern employed for hybrid maize breeding in CentralEurope. Our objectives were to investigate for various agro-nomic traits the accuracy of GPwith DHL in four scenarios: (i)GP within a single library (sL); (ii) GP between pairs of li-braries (LwL); (iii) GP among libraries with a combined TScomposed of several DHL, including lines from the DHL to bepredicted (cLi); and (iv) GP among libraries with a TS com-posed of several DHL, excluding lines from the DHL to bepredicted (cLe). Further, we examined the influence of thesample size, linkage disequilibrium (LD) within DHL, andlinkage phase similarity (LPS) between pairs of DHL on theprediction accuracy. For all scenarios, we compared the em-pirical prediction accuracies with forecasts obtained from de-terministic equations developed by Daetwyler et al. (2008,2010) and Wientjes et al. (2015).

Materials and Methods

Plant materials

We used a set of 351 DH lines derived from six flint landracesand 53 elite flint (EF) lines from the maize breeding programof theUniversity ofHohenheim.The landraces fromwhich thesix DHL were derived, their respective abbreviation, theircountry of origin, and the number (n) of DH lines in eachDHL were: Campan Galade (CG, France, n = 19), GelberBadischer (GB, Germany, n = 50), Strenzfelder (SF, Ger-many, n = 54), Rheintaler (RT, Switzerland, n = 34), SatuMare (SM, Romania, n = 101), and Walliser (WA, Switzer-land, n = 93). The EF and DH lines used in this study repre-sent a subset of the full panel described in a companion paper(Böhm et al. 2017), encompassing in total 460 lines, includ-ing the above-mentioned lines plus 56 further lines not con-sidered in this study from five landraces with less than eightDH lines per landrace, as well as a set of Iodent and flintfounder lines.

Field trials and recorded traits

Fieldtrialswerecarriedout in2013across fouragro-ecologicallydiverse locations in Germany, always using a 463 10 a latticedesign with two replications, as detailed by Böhm et al. (2017).

1186 P. C. Brauner et al.

Page 3: Genomic Prediction Within and Among Doubled-Haploid ... · | GENOMIC PREDICTION Genomic Prediction Within and Among Doubled-Haploid Libraries from Maize Landraces Pedro C. Brauner,*

Briefly, they recorded 16 traits for per se performance, of whichwe chose for this study seven agronomically important traitsreflecting vegetative plant development, disease resistance,and product quality, as well as yield. The traits were: earlyvigor, scored in ratings from 1 (no shoot viable) to 9 (excellentshoot vigor); female flowering, measured as the number ofdays from sowing until silk emergence; Fusarium ear rot re-sistance, scored from1 (all ears infested) to 9 (all ears healthy);plant height, measured in centimeters from the ground to thelowest tassel branch; oil content in %, measured with nuclearmagnetic resonance; protein content of seeds in %, measuredwith near-infrared spectroscopy; and grain yield, measured ingrams per plant. Boxplots with the range of values for each traitand each landrace are shown in Figure S1.

Genotypic data

The 404 lines (351DH lines from the six DHLplus 53 EF lines)were genotyped with the Illumina MaizeSNP50 Beadchip,which contained 56,110 SNPs (Ganal et al. 2011). A qualitycheck was performed by removing SNPs with call frequency,0.9,minor allele frequency, 0.025, andheterozygosity. 1%,following Riedelsheimer et al. (2012). Missing SNPs wereimputed with software BEAGLE version 3.3.2 (Browningand Browning 2007), which resulted in 32,492 high-qualitySNPs.

Statistical analysis

Best linear unbiased estimates (BLUEs) of all DH and EF lineswere computed in two steps. First, an ordinary lattice analysiswith all 460 entries from Böhm et al. (2017) was performedwith the data from each location to obtain adjusted entrymeans and effective error mean squares (Cochran and Cox1957). Second, a combined analysis of variance across loca-tions was performed using the subset of 404 lines, comprisingthe six DHL and the EF lines. BLUEs for each genotype werecalculated in the second stage using the following model:

yijk ¼ mþ pi þ gjðiÞ þ lk þ plik þ eijk; (1)

where m is the overall mean; pi is the fixed effect for pop-ulation i (i.e., the six DHL and the EF lines); gjðiÞ is the fixedeffect of genotype j nested in population i; lk and plik are therandom effects for location k and the interaction of location kwith population i, respectively; and eijk refers to the genotype 3location interaction confounded with the error of the ad-justed entry means from the first stage, and was modeledto account for heterogeneity of corresponding variancesacross the different DHL and the EF lines. All calculationswere performed with the ASReml-R package (Butler et al.2009) using the R statistical language (R Core Team2017).

Genetic distances, LD, and LPS

Genetic distances between genotypes were calculated usingthe modified Rogers’ distance (Reif et al. 2005b). Further, aneighbor-joining tree was computed with the R package

“ape” (Paradis et al. 2004) and the unrooted tree was plottedwith the R package “phyclust” (Chen 2011). LD was calcu-lated as the squared correlation (r2) between pairs ofmarkers(Hill and Robertson 1968). The decay of LD as a function ofthe distance between markers was calculated separately foreach DHL and EF line using a sliding window approach over adistance of 5 Mb, divided into 30 bins, following Technowet al. (2013). The width of each bin was 170 kb, and theaverage r2 between all pairs of markers within this rangewas calculated for each bin and then averaged over the binson all chromosomes. The LPSwas calculated between pairs ofpopulations (DHL and EF lines) using the cosine similaritymeasure for each marker pair (Schopp et al. 2017a), whichtakes a value of 1.0 if the linkage phase is identical in bothpopulations and a value of 0.0 if the linkage phases coincideas expected at random. The same sliding window approachas for the LD was applied for the LPS.

Prediction model

We performed Genomic Best Linear Unbiased Prediction(GBLUP) by using the model

y ¼ 1mþ Zuþ e; (2)

where y is an N-dimensional vector of the BLUEs obtainedfrom the second step of the statistical analysis of the geno-types described above; m is the overall mean and 1 is a vec-tor with 1’s; u is an N-dimensional vector assumed tofollow u � MVNð0;UÞ, where U is a N3N variance–covariance matrix, which is described below; and e is anN-dimensional vector of residuals assumed to followe � MVNð0;RÞ, where R is a N3N dimensional block di-agonal matrix with each block referring to the genotypes inpopulation i (DHL or EF lines), with Ri ¼ Is2

e i, and I beingan Ni 3 Ni identity matrix (with Ni referring to the numberof genotypes from population i) and s2

e i the variance of eijkeffects in population i. The design matrix Z (N3N) assignsgenotypes to the random genotype effects u.

Matrix U has a block structure related to pairwise combi-nations of populations i and i*, with Uii* ¼ Gii*sgisgi* , whereGii* is the Ni 3Ni* dimensional genomic relationship matrixconstructed using the method of Chen et al. (2014) with the

Table 1 Heritability (h2i ) of seven agronomic traits for the EF lines

and DHL from landraces GB, SF, SM, WA, CG, and RT

Landrace

Trait EF GB SF SM WA CG RT Mean

Early vigor 0.91 0.79 0.54 0.67 0.67 0.69 0.81 0.70Female flowering 0.94 0.90 0.79 0.72 0.90 0.90 0.90 0.85Fusarium ear rot 0.22 0.57 0.76 0.65 0.74 0.36 0.45 0.59Plant height 0.73 0.72 0.53 0.85 0.80 0.73 0.65 0.71Grain yield 0.74 0.75 0.63 0.75 0.84 0.87 0.75 0.76Oil content 0.94 0.90 0.75 0.79 0.91 0.89 0.78 0.84Protein content 0.79 0.78 0.70 0.78 0.78 0.93 0.80 0.79

EF, elite flint; GB, Gelber Badischer; SF, Strenzfelder; SM, Satu Mare; WA, Walliser;CG, Campan Galade; RT, Rheintaler.

Genomic Prediction in Landraces 1187

Page 4: Genomic Prediction Within and Among Doubled-Haploid ... · | GENOMIC PREDICTION Genomic Prediction Within and Among Doubled-Haploid Libraries from Maize Landraces Pedro C. Brauner,*

modification suggested by Schopp et al. (2017b), andsgi and sgi* are the corresponding genotypic SD. The off-diagonal blocks (i 6¼ i*) of the G matrix were calculated

as Gii* ¼ WiWTi*ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

2P ðpmÞi½12 ðpmÞi�

p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi2P ðpmÞi*½12 ðpmÞi*�

p , where Wi is

a Ni 3M matrix of genotypes and their M SNP markers,which is centered by the allele frequencies of the i-th popu-lation, and pm is the frequency of the major allele at them-thlocus in population i, where the major allele is defined acrossall populations. This simplifies for the diagonal blocks (i ¼ i*)to the genomic relationship matrix obtained by method 1 ofVanRaden (2008). In addition, for the EF, we performedpredictions replacing Gii by the numerator relationship ma-trix calculated as described by Westhues et al. (2017), usingpedigree information at least up to the grandparents. Thevariance components s2

g i and s2e i were computed from the

combined analysis of adjusted entry means across all loca-tions using Equation 1, but with the term gjðiÞ used as a ran-dom effect and Gii as the genomic relationship matrix. Traitheritabilities (h2i ) for each population i were calculated withthe formula h2i ¼ s2

g i = ðs2g i þ s2

e i=LÞ, where L is the num-ber of locations. The variance components were estimatedusing the ASReml-R package (Butler et al. 2009) within theR environment (R Core Team 2017). All predictions for eachscenario were computed using mixedmodel equations imple-mented within R (R Core Team 2017).

Prediction scenarios

We used four different prediction scenarios using the DHL orEF lines, as well as combinations of these populations, whichare summarized in Table S2. (i) The sL scenario in combina-tion with leave-one-out cross-validation (LOOCV) was per-formed with each of the four largest DHL (GB, SF, SM, andWA) and the EF lines. In each population i, Ni =50 genotypeswere randomly sampled and the prediction was carried outwith LOOCV. Using the two largest DHL (SM and WA), weadditionally evaluated for this scenario how an increase of Ni

influences the predictions. We increased Ni from 20 to 90 ge-notypes in increments of five. (ii) In the LwL scenario, we

randomly sampled Ni = 90 genotypes from one of the twolargest DHL (SM and WA), which served as TS, and used alllines from one of the other five DHL or from the EF lines as theprediction set (PS) for simple validation (SV). In addition, weevaluated this scenario using one of the four libraries (GB, SF,SM, and WA) with random samples of Ni = 50 genotypes asTS to predict each of the other three libraries. (iii) In the cLiscenario, Ni = 50 genotypes from each of the four largestDHL (GB, SF, SM, and WA) were randomly sampled andcombined in one data set to perform LOOCV. The predic-tion accuracy was calculated separately for the 50 geno-types of each library. (iv) In the cLe scenario, Ni =50 genotypes from each of the four largest DHL (GB, SF,SM, and WA) were randomly sampled and three DHL werecombined to construct the TS, comprising a total of 150 ge-notypes. All genotypes of the fourth DHL not included inthe TS were used as PS for SV, and each DHL was used onceas the PS. We additionally evaluated for this scenario an-other combination of TS, consisting of random samples ofNi = 19 genotypes (corresponding to the sample size of thesmallest DHL, CG) from each of five of the six DHL (GB, SF,SM, WA, CG, and RT), to predict the remaining DHL or theEF lines.

In each scenario described above, we calculated the pre-diction accuracy (r) as the correlation between the BLUEsobtained from the second step of the statistical analysis andthe genomic estimated breeding values obtained from Equa-tion 2 divided by the square root of h2i in the PS (Dekkers2007). The sampling of the genotypes used in the variousscenarios from the entire number (n) of genotypes availablefrom each DHL was repeated 100 times and r averaged overall repetitions was reported. In addition, for every repetition,we estimated the SE of r with 500 bootstrap samples withreplacement (Kadam et al. 2016) and averaged these valuesover the 100 repetitions.

Forecast of the prediction accuracy

The prediction accuracies for the scenarios sL and LwL wereforecasted with deterministic equations originally devised by

Table 2 Prediction accuracy (r 6 SE) of seven agronomic traits from GP for scenario sL, obtained with GBLUP by LOOCV using Ni =50 sampled from the EF lines or DH lines from the landraces GB, SF, SM, and WA, as well as the means of the forecasted predictionaccuracies rD and rW across traits

Landrace

Trait EF (P) EF GB SF SM WA Mean

Early vigor 0.48 6 0.11 0.42 6 0.13 0.15 6 0.22 20.26 6 0.12 0.37 6 0.12 0.38 6 0.12 0.16Female flowering 0.21 6 0.13 0.33 6 0.11 0.01 6 0.18 0.40 6 0.12 0.34 6 0.11 0.41 6 0.12 0.29Fusarium ear rot 20.11 6 0.10 0.47 6 0.10 0.26 6 0.13 0.14 6 0.15 0.10 6 0.13 20.01 6 0.11 0.12Plant height 0.68 6 0.09 0.56 6 0.11 20.03 6 0.16 0.57 6 0.11 0.36 6 0.11 0.14 6 0.12 0.26Grain yield 0.40 6 0.14 0.53 6 0.12 20.27 6 0.14 20.53 6 0.10 20.06 6 0.14 0.14 6 0.12 20.18Oil content 0.54 6 0.12 0.71 6 0.07 20.24 6 0.14 0.03 6 0.15 0.48 6 0.11 0.31 6 0.12 0.14Protein content 0.56 6 0.11 0.74 6 0.07 20.24 6 0.12 0.05 6 0.13 0.30 6 0.12 0.52 6 0.11 0.16rD 0.50 0.16 0.28 0.35 0.46rW 0.47 0.21 0.33 0.36 0.41

The results obtained for the EF with pedigree-Best Linear Unbiased Prediction (P) are also given. EF, elite flint; GB, Gelber Badischer; SF, Strenzfelder; SM, Satu Mare; WA,Walliser.

1188 P. C. Brauner et al.

Page 5: Genomic Prediction Within and Among Doubled-Haploid ... · | GENOMIC PREDICTION Genomic Prediction Within and Among Doubled-Haploid Libraries from Maize Landraces Pedro C. Brauner,*

Daetwyler et al. (2008, 2010), denoted as rD, and Wientjeset al. (2013, 2015), denoted as rW . However, we used amodified version of these equations proposed by Schoppet al. (2017b) accounting for (i) inbreeding of the genotypes,because all our genotypes were pure-breeding lines, and (ii)the different proportion of polymorphic markers in the TSand PS. A further modification (multiplication by r2MM=2) pro-posed by Lian et al. (2014) was implemented, which accountsfor incomplete linkage between quantitative trait loci (QTL)and markers by assuming that the QTL position is locatedclose to the midpoint between markers. The deterministicequation rD is based on population parameters and was cal-culated as:

rD ¼ r2MM=2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiuii*

�NTSh2i

.½r2MM=2NTSh2i þMe�

�s; (3)

where r2MM=2 is the mean of the square root of the r2 betweenpairs of adjacent markers; NTS is the number of individualsin the TS; h2i is the heritability in the PS; and Me is the ef-fective number of chromosome segments calculatedas Me ¼ 4=varðGii*Þ (Wientjes et al. 2015), where varðGii*Þrefers for i 6¼ i* to the variance of the elements in matrix Gii*

and for i ¼ i* to the variance of the off-diagonal elements inmatrix Gii, and uii* ¼ jLi\ i* j=jLij, where jLij is the number ofloci polymorphic in population i and jLi\ i* j is the number ofloci polymorphic in both i and i*, which equals 1 in scenariosL. To calculate rW , the reliability of the GBLUP value forgenotype j within population i, serving as the PS, was com-puted as:

r2ij ¼ gij;i*T

"Gi*i* þ I

s2e i*

s2g i*

#21

giji*�2; (4)

where gij;i* is a vector of genomic relationship values for thisgenotype with all genotypes in population i*, serving as the

TS. The forecasted prediction accuracy rW was calculated asthe average over genotype j in population i as

rW ¼ r2MM=2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1Ni

XNi

j¼1

r2ij

vuut : (5)

Data availability

All statistical analyses were carried out in the R environment(R Core Team 2017). Data for agronomic traits of the DHlines and the EF lines are available in supplemental file“FileS1.txt.” For the same genotypes, the genomic data areavailable in the supplemental file “FileS2.txt.” Supplementalmaterial available at Figshare: https://doi.org/10.25386/genetics.6667481.

Results

Heritability (h2i ) was generally high, with moderate variationamong traits and populations (Table 1). Themean h2i for eachtrait across the six DHL ranged from 0.59 for Fusarium ear rotto 0.85 for female flowering. The range in h2i for Fusarium earrot was larger than for the other traits, with a maximum of0.76 for SF and a minimum of 0.22 for the EF lines. The latterwere highly selected for this trait and, consequently, dis-played a small genetic variance. High h2i values were foundin all DHL for female flowering and oil content except for SFand SM.

For the sL scenario, the highest r values were observed forthe EF lines, which generally exceeded r of the DHL, exceptfemale flowering in SF, SM, and WA, and plant height in SF(Table 2). Among the DHL, r was higher in WA and SM thanin GB and SF for most traits, except Fusarium ear rot andplant height. Averaged across DHL, r showed the highestvalue for female flowering (0.29) and the lowest for grainyield (20.18). The forecasted prediction accuracies rD and

Figure 1 Prediction accuracy (r)of seven agronomic traits fromgenomic prediction for the sce-nario sL obtained with GenomicBest Linear Unbiased Predictionby leave-one-out cross-validationusing increasing sample size (Ni )from doubled-haploid lines sam-pled from the landraces Satu Mareor Walliser.

Genomic Prediction in Landraces 1189

Page 6: Genomic Prediction Within and Among Doubled-Haploid ... · | GENOMIC PREDICTION Genomic Prediction Within and Among Doubled-Haploid Libraries from Maize Landraces Pedro C. Brauner,*

rW were averaged for each population because we observedonly small differences across values (Table S3). In addition,the mean values for rD were similar to rW , with differencesranging between 0.01 and 0.05. The forecasted predictionaccuracies were generally close to r for EF and SM, butexceeded r for GB, SF, and WA.

Increasing Ni in scenario sL resulted in a convex curve of rfor all traits in the two largest DHL (SM and WA) analyzed(Figure 1). As expected, the increments were biggest fromNi = 20 to Ni = 50, but increasing Ni to 90 still raised r byaround 0.11 on average for both DHL when compared toNi = 50. Although the trend was similar across traits, r dif-fered for each trait and for each DHL. For example, the pre-diction of protein content using individuals from SM had thefifth highest r (0.41 with Ni = 90), whereas the same traitpredicted with individuals from WA had the highest r (0.59with Ni = 90). The only trait that had similar r between thetwo DHL was grain yield, which was poor in both cases.

For scenario LwL using WA and SM as the TS (Ni = 90),predictions showed poor results with high variation (Table3). Among the different TS/PS combinations, SM/CG andSM/GB showed higher r values in the majority of traits. Av-eraged across DHL, when SM served as the TS, r ranged be-tween 0.25 for Fusarium ear rot and 20.19 for femaleflowering, whereas r ranged between 0.13 for oil contentand 20.08 for plant height when WA served as the TS. Esti-mates of r varied strongly across traits for specific combina-tions of TS and PS, with the highest and lowest r estimateobserved for combinations SM/CG (0.81) and WA/EF(20.66), respectively, for Fusarium ear rot. Forecasting byrD and rW yielded values between 0.05 and 0.10. The LwL

scenario with the four largest DHL (GB, SF, SM, andWA) andNi = 50 also showed r values close to zero for all cases(Figure S2). For combinations including SM or WA as theTS, the values hardly differed from those in Table 3.

By combining genotypes into a larger TS in the cLi scenario,r was higher than sL for all combinations of traits and DHL,except protein content for WA (Table 4). Scenario cLe, wherethree of the four largest DHL (GB, SF, SM, and WA) werecombined with equal numbers (Ni = 50) to form the TS,yielded generally low r values with smaller variation thanobserved for scenario LwL. Likewise, no improvement in r

values was attained by combining five of the six DHL (GB,SF, SM, WA, CG, and RT each with Ni = 19; Table S4), in-dicating that scenario cLe has similarly low r compared toLwL.

Discussion

GP with DHL from landraces of allogamous crops

The fundamental idea behind GBLUP, namely replacement ofthe relationshipmatrix in the Best Linear Unbiased Prediction(BLUP) approach of Henderson (1985) by a marker-basedrelationship matrix, was initially put forward by Bernardo(1994) for the prediction of hybrid performance. GP in thepresent form was originally proposed for cattle breeding(Meuwissen et al. 2001) and was rapidly adopted by plantbreeders (Jannink et al. 2010; Albrecht et al. 2011; de losCampos et al. 2013; Crossa et al. 2017). However, constraintsin TS size and specific population structure often pose diffi-culties for GP in plant breeding compared to animal breeding

Table 3 Prediction accuracy (r 6 SE) of seven agronomic traits from GP for scenario LwL obtained with GBLUP by SV, using the DH linessampled from the landraces SM or WA as the TS (Ni = 90), and the EF lines or DH lines from other landraces as the PS, as well as the meansof the forecasted prediction accuracies rD and rW across traits

Trait EF

Landrace

GB SF SM WA CG RT Mean

SMEarly vigor 0.00 6 0.12 0.15 6 0.16 20.02 6 0.13 — 0.01 6 0.10 20.30 6 0.19 20.04 6 0.18 20.04Female flowering 0.24 6 0.14 20.14 6 0.13 20.36 6 0.11 — 20.08 6 0.11 20.28 6 0.19 20.09 6 0.18 20.19Fusarium ear rot 0.00 6 0.12 0.16 6 0.19 20.14 6 0.15 — 0.15 6 0.11 0.81 6 0.24 0.25 6 0.11 0.25Plant height 20.16 6 0.17 0.27 6 0.12 0.33 6 0.13 — 0.06 6 0.10 0.22 6 0.22 0.21 6 0.25 0.22Grain yield 20.26 6 0.13 0.08 6 0.17 0.16 6 0.12 — 20.09 6 0.10 0.23 6 0.18 20.50 6 0.14 20.03Oil content 20.05 6 0.11 0.13 6 0.14 0.10 6 0.14 — 0.12 6 0.10 0.48 6 0.26 20.44 6 0.15 0.08Protein content 0.08 6 0.13 20.25 6 0.15 0.21 6 0.13 — 20.27 6 0.10 20.18 6 0.22 20.05 6 0.19 20.11rD 0.09 0.07 0.06 — 0.06 0.06 0.04rW 0.09 0.06 0.05 — 0.06 0.06 0.05

WAEarly vigor 0.41 6 0.10 20.13 6 0.14 0.24 6 0.12 20.06 6 0.09 — 0.10 6 0.24 20.24 6 0.17 20.02Female flowering 20.16 6 0.13 0.03 6 0.16 0.12 6 0.15 20.03 6 0.09 — 0.18 6 0.25 0.23 6 0.15 0.11Fusarium ear rot 20.67 6 0.09 0.19 6 0.10 20.14 6 0.13 0.11 6 0.08 — 20.11 6 0.20 0.01 6 0.14 0.01Plant height 0.04 6 0.15 0.03 6 0.15 20.07 6 0.11 20.14 6 0.11 — 20.05 6 0.35 20.14 6 0.16 20.08Grain yield 0.02 6 0.13 0.11 6 0.14 0.01 6 0.12 20.17 6 0.11 — 0.02 6 0.25 0.11 6 0.14 0.02Oil content 20.10 6 0.16 20.07 6 0.15 0.29 6 0.14 0.26 6 0.10 — 0.06 6 0.22 0.09 6 0.21 0.13Protein content 0.19 6 0.14 0.08 6 0.14 0.03 6 0.15 20.16 6 0.09 — 20.18 6 0.27 0.31 6 0.17 0.02rD 0.05 0.07 0.07 0.10 — 0.06 0.06rW 0.06 0.07 0.07 0.06 — 0.05 0.07

EF, elite flint; GB, Gelber Badischer; SF, Strenzfelder; SM, Satu Mare; WA, Walliser; CG, Campan Galade; RT, Rheintaler.

1190 P. C. Brauner et al.

Page 7: Genomic Prediction Within and Among Doubled-Haploid ... · | GENOMIC PREDICTION Genomic Prediction Within and Among Doubled-Haploid Libraries from Maize Landraces Pedro C. Brauner,*

(Hickey et al. 2017). Apart from Ni and h2i , the LD and actualrelationships between genotypes in the TS and PS have aprofound influence on the magnitude and variation ofthe prediction accuracy (Habier et al. 2007; Schopp et al.2017a).

GP in plant breeding has been found to be most promisingwithin biparental families because all genotypes are related,and GP can exploit the Mendelian sampling variance bycosegregation of markers and linked QTL (Habier et al.2007, 2013; Riedelsheimer et al. 2013; Crossa et al. 2014;Schopp et al. 2017a,b). For the same values of Ni and h2i ,estimates of r for the DHL in our study were considerablylower than r reported in the literature for biparental families(Riedelsheimer et al. 2013; Lehermeier et al. 2014; Lian et al.2014). This discrepancy is attributable to the fact that DHL oflandraces of allogamous crops differ fundamentally fromthese types of populations in two ways. First, landraces areexpected to display a much lower level of LD due to thehundreds of panmictic generations during their evolution(Mayer et al. 2017). Random mating is expected to reduceLD in each generation (Falconer andMackay 1996), providedthe population size is sufficiently large (Hill and Robertson1968). Second, the genotypes in a DHL have a very shallowpedigree relationship if a sufficiently large effective popula-tion size was employed in the collection and afterward in themaintenance of the accessions. Thus, the probability that twogenotypes have a common ancestor in recent generations isextremely small unless two DH lines originated from thesame S0 plant used for in vivo haploid induction. Withthe low success rate of DH production from landraces(Melchinger et al. 2017), this is expected to occur rarely.We found no evidence for such an event in all six landracesbased on the boxplots of modified Rogers’ distances betweenpairs of DH lines (Figure S3), because all observationsexceeded half of the mean value.

There is also a fundamental difference between GP inlandraces of allogamous and autogamous species due to theirpopulation structure. Accessions of autogamous crops gener-ally consist of a single line, or a bulk of closely related lineswith little genetic variationwithin andmuch variation amonglandraces (Dreisigacker et al. 2005); for this reason, predic-tion accuracy across landraces is of primary interest. In addi-tion, autogamous species show high LD due to limitedrecombination during selfing generations and moderatelyhigh relatedness between landraces (Cavanagh et al. 2013;Daetwyler et al. 2014). This explains the high prediction ac-curacy observed for GP across landraces in wheat (Crossaet al. 2016) and sorghum (Yu et al. 2016). In contrast, mostof the genetic variation of allogamous crops is within and notamong accessions, as applies to the six DHL in our study(Böhm et al. 2017; Melchinger et al. 2017). For this reason,GP is mainly concerned with prediction within landraces.

Prediction accuracy within DHL from landraces andEF lines

Among the DHL, r values obtained for scenario sL with Ni=50 were moderate for SM and WA, and generally low for GBand SF, with both positive and negative estimates of r forindividual traits (Table 2). This corresponds to the rankingof the four DHL for LD, showing a high level in WA and SM,and a rapid decay of LD in GB and SF (Figure 2A). A low-leveland steep decay of LD goes along with a high effective num-ber of chromosome segments, which reduces r based on the-oretical expectations (Daetwyler et al. 2008, 2010). Thissuggests that LD is a major factor influencing the predictionaccuracy within landraces besides h2i , the genetic architec-ture of the trait, and the size of the TS.

The prediction accuracy formost traitswas higher in the EFthan in the DHL GB, SF, SM, and WA (Table 2). Although EFand WA display similar levels of LD and LD decay (Figure

Table 4 Prediction accuracy (r 6 SE) of seven agronomic traits from GP for scenarios cLi and cLe obtained with GBLUP by LOOCV and bySV, respectively

Landrace

Trait GB SF SM WA Mean

cLiEarly vigor 0.39 6 0.18 0.10 6 0.13 0.45 6 0.12 0.56 6 0.11 0.38Female flowering 0.13 6 0.17 0.40 6 0.12 0.39 6 0.12 0.48 6 0.11 0.35Fusarium ear rot 0.79 6 0.11 0.28 6 0.13 0.24 6 0.12 0.08 6 0.12 0.35Plant height 0.32 6 0.14 0.86 6 0.11 0.49 6 0.10 0.26 6 0.12 0.48Grain yield 20.01 6 0.15 20.45 6 0.11 0.10 6 0.14 0.26 6 0.12 20.02Oil content 0.06 6 0.15 0.27 6 0.15 0.59 6 0.11 0.36 6 0.12 0.32Protein content 0.06 6 0.13 0.21 6 0.13 0.36 6 0.13 0.47 6 0.12 0.27

cLeEarly vigor 0.04 6 0.14 0.29 6 0.13 0.09 6 0.10 20.08 6 0.11 0.08Female flowering 20.24 6 0.15 20.27 6 0.12 20.29 6 0.08 20.17 6 0.09 20.24Fusarium ear rot 0.27 6 0.16 0.00 6 0.15 0.14 6 0.07 0.08 6 0.11 0.12Plant height 0.20 6 0.13 0.23 6 0.12 0.23 6 0.10 20.04 6 0.10 0.15Grain yield 0.05 6 0.15 0.05 6 0.12 0.04 6 0.11 0.15 6 0.10 0.07Oil content 0.04 6 0.15 0.24 6 0.13 0.27 6 0.10 0.22 6 0.10 0.19Protein content 0.03 6 0.13 0.13 6 0.13 20.15 6 0.09 20.01 6 0.11 0.00

The combined libraries comprised Ni = 50 doubled-haploid lines from each of four landraces (cLi) or three landraces (cLe); r was calculated separately for each doubled-haploid library. GB, Gelber Badischer; SF, Strenzfelder; SM, Satu Mare; WA, Walliser.

Genomic Prediction in Landraces 1191

Page 8: Genomic Prediction Within and Among Doubled-Haploid ... · | GENOMIC PREDICTION Genomic Prediction Within and Among Doubled-Haploid Libraries from Maize Landraces Pedro C. Brauner,*

2A), prediction within EF yielded larger r than within WA.This difference is likely due to closer pedigree relationshipbetween some EF lines (Figure S3 and Figure S5). This hy-pothesis is supported by only slightly higher r values forGBLUP than for pedigree-BLUP obtained in EF (Table 2).Thus, in addition to LD, additive genetic relationships con-tributed to GP for EF.

The results for GB and SF suggest that GP is not promisingfor DHL with low levels of LD and a sample size of Ni =50 genotypes (Table 2). However, even with higher LD inWA and SM, the r values for Ni = 50 were close to zero forseveral traits. Increasing Ni in these two DHL from 50 to90 improved the r values by 51% on average (Figure 1),but for grain yield and Fusarium ear rot, r was still far toolow to be of practical benefit. Further research is required toassess the r valueswith larger sample sizes, becausewith Ni=90 we did not observe a plateau of the prediction accuracycurve with increasing TS size.

The population structure of landraces closely resemblesthat of synthetics produced by intermating a large number ofparental components, especially if several generations of re-combination were applied prior to selection. In an empiricalstudy with synthetic populations of alfalfa (Medicago sativaL.), r values were �0.30 for Ni = 125 (Annicchiarico et al.

2015). In a simulation study by Müller et al. (2017), withsynthetics from 16 parental lines and five cycles of recombi-nation prior to construction of the TS, r values ranged be-tween 0.20 and 0.40, depending on the ancestral LD.Moreover, the contribution of pedigree relationships to r

was drastically reduced with additional recombinationcycles, whereas the contribution of LD remained constant.Altogether, these results further corroborate that LD is thedriving force for GP in DHL from landraces.

Variation in the prediction accuracy among traits

Averaged across the four DHL, there was considerable vari-ation in r among traits (Table 2). Such variation was alsoobserved among segregating biparental populations in exper-iments with maize (Lehermeier et al. 2014; Lian et al. 2014)and in simulations (Schopp et al. 2017b). A possible expla-nation is that each trait has a different genetic architecture, sothat covariances among individuals estimated by genome-wide markers may not equally well reflect the conditions atthe underlying QTL. We also observed a large SE of r undersmall sample size (Ni = 50) for scenario sL (Table 2). Thisseems to be a shortcoming of LOOCV in comparison withordinary K-fold cross-validation, as demonstrated for theDHL SM (Figure S6).

The large number of low, and sometimes even negative, rvalues for grain yield and other traits in GB and SF could alsobe attributed to the high genetic load in these DHL, as sug-gested by the low success rate of DH production in these land-races compared with SM and WA. While the DH productionhas a positive effect on purging landraces from their geneticload (Strigens et al. 2013; Melchinger et al. 2017), it seemslikely that many detrimental alleles still escaped completeelimination, as reflected by the poor seed set in thesematerials(Böhm et al. 2017). Such alleles are largely missed by GBLUPbecause they occur at low frequency and are not likely to betagged by SNP arrays designed for elite material. Detrimentalalleles can also negatively interact with the genetic back-ground and the GBLUP model is not efficient in accountingfor epistatic effects (Jiang and Reif 2015; Martini et al. 2017).

GP between DHL from landraces

GP between populations was first investigated in animalbreeding, where it showed much lower accuracy comparedto GPwithin populations (Hayes et al. 2009; Toosi et al. 2010;Kachman et al. 2013). As shown by simulations (Schopp et al.2017a) and empirical results in maize (Riedelsheimer et al.2013), r values for GP between biparental families were�60% lower for unrelated families compared with the esti-mates of r for GP within full-sib families. Although GBLUPbenefits mainly from the relatedness among genotypes, theLD betweenQTL andmarkers can also contribute to predictionaccuracy for scenario LwL (Habier et al. 2013; Schopp et al.2017a). However, similar to the studies in animal breeding, weobtained r values close to zero in this scenario, either using SMorWAwith Ni =90 (Table 3), or GB, SF, SM, orWAwith Ni =50 (Figure S2) as the TS and a different DHL as the PS.

Figure 2 (A) Linkage disequilibrium (LD) and (B) cluster analysis based onmodified Rogers’ distance for each doubled-haploid line from landraces GB(Gelber Badischer), SF (Strenzfelder), SM (Satu Mare), WA (Walliser), CG(Campan Galade), and RT (Rheintaler), as well as the elite flint (EF) lines.

1192 P. C. Brauner et al.

Page 9: Genomic Prediction Within and Among Doubled-Haploid ... · | GENOMIC PREDICTION Genomic Prediction Within and Among Doubled-Haploid Libraries from Maize Landraces Pedro C. Brauner,*

A possible explanation for the low r values in scenario LwLis that many QTL, which segregate in the PS, do not segregatein the DHL serving as the TS. Thus, the effects of those ge-nomic regions of the PS cannot be properly predicted fromthe TS, which gives poor r, as demonstrated for biparentalfamilies by Schopp et al. (2017b). A further reason is thedifference in allele substitution effects of markers betweentwo DHL (Lehermeier et al. 2015; Han et al. 2018). The lowpersistency of LD pattern between landraces, as reflected bythe low level of LPS (Figure 3, A and B), particularly outsidethe centromeric regions (data not shown), is an indication ofsuch differences in estimated marker effects. However, nopattern could be recognized between the LPS of pairwisecombinations of DHL and the observed r because means ofr were quite low, and there was a large variation of r fordifferent combinations of TS/PS and traits (Table 3). In pre-dictions across unrelated biparental families (Schopp et al.2017b), the LPS was only loosely correlated with the predic-tion accuracy, whereas the parameter uii*, referring to theproportion of polymorphic markers between two DHL, wasmuchmore important. However, in our study, we observed noassociation between uii* (Figure S4) and r.

GP combining DHL of landraces

By combining multiple landraces in scenario cLi, r was onaverage 0.17 higher than for scenario sL (Table 4). This isin agreement with the literature, in which the combination ofmultiple populations yielded higher accuracies (Hayes et al.2009; Schulz-Streeck et al. 2012; Chen et al. 2014; Iheshiuloret al. 2016). Therefore, predictions in scenario cLi benefitedfrom the larger TS compared to scenario sL. Interestingly,accounting for population structure (cf. Figure 2B) by includ-ing fixed effects for the different DHL in Equation 2 had anegative impact on r for scenario cLi (data not shown), asalso reported in other studies with multiple populations(Daetwyler et al. 2012; Crossa et al. 2016). Hence, differ-ences between populations, which may lead to false positivesin genome-wide association studies, can be exploited benefi-cially in GP.

Scenario cLe (N = 150), where the DHL in the TS did notinclude the DHL serving as the PS, showed similarly low r

values as scenario LwL (Table 4). This result is similar to GPwhen combining unrelated biparental families, which yieldedr values close to zero or even negative values although the TSsize was increased (Riedelsheimer et al. 2013; Würschumet al. 2017). Further research iswarranted to investigate underwhich circumstances inclusion of DH lines from another land-race will improve the prediction accuracy in a combined TS.

Forecasting prediction accuracy withdeterministic formulas

Forecasting r by modifications of the methods of Daetwyleret al. (2008) andWientjes et al. (2015) yielded for scenario sLa similar ranking of the four DHL for the rD and rW values asfor the empirical r values averaged over traits (Table 1).However, the forecasted values were much higher than the

empirical ones, particularly for GB and SF. This most likelyreflects a violation of the assumptions used in the derivationof the formulas, such as (i) high LD between QTL andmarkers (Wientjes et al. 2015) and (ii) additive gene actionat the QTL. Assumption (i) was relaxed by multiplicationwith r2MM=2 (Lian et al. 2014), which accounts for incompletelinkage between QTL and markers. Assumption (ii) wouldnot hold true if detrimental alleles have negative epistaticeffects in the DHL, as discussed above. Further, if allele fre-quencies of SNPs and QTL are different, this can lead to asubstantial bias in rD and rW (Schopp et al. 2017b). The latterproblem most likely applies to our study, because we used aSNP array optimized for temperate and tropical dent germ-plasm (Ganal et al. 2011) that is prone to ascertainment biasin flint materials (Frascaroli et al. 2013). Obviously, the conse-quences are more severe if GP relies predominantly on exploit-ing LD, but are only weak if additive genetic relationships arethe driving force of GP, aswas the case in the EF. Calculating rD

Figure 3 Linkage phase similarity (LPS) of pairwise combinations of dou-bled-haploid (DH) lines (A) from SM (Satu Mare) with the elite flint (EF)lines, as well as all other DH lines from landraces GB (Gelber Badischer), SF(Strenzfelder), SM, WA (Walliser), CG (Campan Galade), and RT(Rheintaler); (B) from WA with the EF lines as well as DH lines from land-races CG, GB, RT, and SF, and from GB with DH lines from landrace SF.

Genomic Prediction in Landraces 1193

Page 10: Genomic Prediction Within and Among Doubled-Haploid ... · | GENOMIC PREDICTION Genomic Prediction Within and Among Doubled-Haploid Libraries from Maize Landraces Pedro C. Brauner,*

and rW as well as LD might also provide clues about the min-imum TS size required to achieve the desired size of r.

Use of landraces in breeding programs

With DHL, the genetic diversity among gametes within land-races can be conserved in the form of “immortalized” geno-types. Nevertheless, DH production from landraces is stilllaborious and expensive, and the success rate is unpredict-able (Melchinger et al. 2017). Therefore, it is crucial to screena large number of accessions for their agronomic perfor-mance in line per se and testcross trials (Böhm et al. 2014,2017), and test their suitability for DH production beforeembarking on the production of DHL from landraces andthe application of GP.

To balance the expenditures for GP in DHL with the gainsexpected for genomic selection, a large enough DHL must beconstructed, which is justified only in combination with anequally large PS to compensate for the high investments(Riedelsheimer andMelchinger 2013). Thus, a high selectionintensity on the line per se performance of the DH lines wouldbe possible, while retaining a sufficient number of genotypesfor subsequent evaluation of their testcross performance, be-fore channeling the most promising lines into prebreedingprograms. Alternatively, a few (�10) of the highest-performingDH lines could be selected and intercrossed to conduct agenomic recurrent selection program. However, choosinga small number of lines is expected to generate new sampleLD (Schopp et al. 2017a), which may invalidate the predic-tion model established with the DHL.

In contrast to the development of DHL, harboring the ge-nomeof gametes froma landrace in pure form, onemight apply“gamete selection” (Stadler 1944). In this method, the land-race serves as pollinator of an elite line, and selection is carriedout in selfing or backcross generations derived from the cross.GP could benefit thismethod, as shown in simulations (Gorjancet al. 2016). In particular, it might be of interest to evaluateapplication of the prediction model established by our ap-proach to crosses of the original landrace with an elite inbred.

Conclusions

DH lines from landraces of allogamous crops are highly di-verse and virtually unrelated by pedigree. Consequently, LD isthe main source of quantitative genetic information exploit-able for prediction. Owing to the rapid decay of LD, high-density genome-widemarkers and large TS sizes are requiredfor the successful implementation of GP in such populations.Based on the trends for r in the two largest DHL (SM andWA), we speculate that a minimum TS size of. 100 DH linesper landrace is required to reach a decent prediction accu-racy, but this warrants further research. GP across DHL failed,if the DHL to be predicted was not represented in the TS(scenarios LwL and cLe). However, if several smaller DHLplus genotypes from the DHL to be predicted were includedin the TS (scenario cLi), this yield improved and more stableresults were obtained than for GP within the DHL alone (sce-nario sL). Altogether, the DH technology combined with GP

offers a powerful approach to exploit the idle genetic diver-sity within landraces, but substantial investments are neededto mine this “gold reserve” for future breeding.

Acknowledgments

We thank Willem Molenaar and Tobias Schrag for valuablesuggestions to improve the manuscript; the technical stafffrom the University of Hohenheim for excellence in con-ducting the field experiments; and KWS SAAT SE in Einbeckfor the additional field experiment, as well as T. Presterl andT. Bolduan for conducting it. This research was funded bythe German Federal Ministry of Education and Research(Bundesministerium für Bildung und Forschung) within thescope of the funding initiatives AgroClustEr “Synbreed-Synergistic plant and animal breeding” (project number:0315528D) and MAZE “Plant Breeding Research for theBioeconomy” (funding identifier: 031B0195).

Literature Cited

Albrecht, T., V. Wimmer, H. J. Auinger, M. Erbe, C. Knaak et al.,2011 Genome-based prediction of testcross values in maize.Theor. Appl. Genet. 123: 339–350. https://doi.org/10.1007/s00122-011-1587-7

Annicchiarico, P., N. Nazzicari, X. Li, Y. Wei, L. Pecetti et al.,2015 Accuracy of genomic selection for alfalfa biomass yieldin different reference populations. BMC Genomics 16: 1020.https://doi.org/10.1186/s12864-015-2212-y

Bernardo, R., 1994 Prediction of maize single-cross performance usingRFLPs and information from related hybrids. Crop Sci. 34: 20–25.https://doi.org/10.2135/cropsci1994.0011183X003400010003x

Böhm, J., W. Schipprack, V. Mirdita, H. F. Utz, and A. E. Mel-chinger, 2014 Breeding potential of European flint maize land-races evaluated by their testcross performance. Crop Sci. 54:1665. https://doi.org/10.2135/cropsci2013.12.0837

Böhm, J., W. Schipprack, H. F. Utz, and A. E. Melchinger,2017 Tapping the genetic diversity of landraces in alloga-mous crops with doubled haploid lines: a case study fromEuropean flint maize. Theor. Appl. Genet. 130: 861–873.https://doi.org/10.1007/s00122-017-2856-x

Browning, S. R., and B. L. Browning, 2007 Rapid and accuratehaplotype phasing and missing-data inference for whole-genomeassociation studies by use of localized haplotype clustering. Am.J. Hum. Genet. 81: 1084–1097. https://doi.org/10.1086/521987

Butler, D. G., B. R. Cullis, A. R. Gilmour, and B. J. Gogel,2009 Mixed models for S language environments. ASReml-Rreference manual: release 3.0. technical report. ASReml estimatesvariance components under a general linear mixed model by re-sidual maximum likelihood (REML).

Cavanagh, C. R., S. Chao, S. Wang, B. E. Huang, S. Stephen et al.,2013 Genome-wide comparative diversity uncovers multipletargets of selection for improvement in hexaploid wheat land-races and cultivars. Proc. Natl. Acad. Sci. USA 110: 8057–8062.https://doi.org/10.1073/pnas.1217133110

Chen, L., M. Vinsky, and C. Li, 2014 Accuracy of predicting geno-mic breeding values for carcass merit traits in Angus and Cha-rolais beef cattle. Anim. Genet. 46: 55–59. https://doi.org/10.1111/age.12238

Chen, W.-C., 2011 Overlapping codon model, phylogenetic clus-tering, and alternative partial expectation conditional maximi-zation algorithm, Ph.D. Thesis, Iowa State University, Ames, IA.

1194 P. C. Brauner et al.

Page 11: Genomic Prediction Within and Among Doubled-Haploid ... · | GENOMIC PREDICTION Genomic Prediction Within and Among Doubled-Haploid Libraries from Maize Landraces Pedro C. Brauner,*

Cochran, W. G., and G. M. Cox, 1957 Experimental Designs, Ed. 2.Wiley, London.

Crossa, J., P. Pérez, J. Hickey, J. Burgueño, L. Ornella et al.,2014 Genomic prediction in CIMMYT maize and wheat breed-ing programs. Heredity (Edinb) 112: 48–60. https://doi.org/10.1038/hdy.2013.16

Crossa, J., D. Jarquín, J. Franco, P. Pérez-Rodríguez, J. Burgueñoet al., 2016 Genomic prediction of gene bank wheat landraces.G3 (Bethesda) 6: 1819–1834. https://doi.org/10.1534/g3.116.029637

Crossa, J., P. Pérez-Rodríguez, J. Cuevas, O. Montesinos-López, D.Jarquín et al., 2017 Genomic selection in plant breeding:methods, models, and perspectives. Trends Plant Sci. 22: 961–975. https://doi.org/10.1016/j.tplants.2017.08.011

Daetwyler, H. D., B. Villanueva, and J. A. Woolliams, 2008 Accuracyof predicting the genetic risk of disease using a genome-wideapproach. PLoS One 3: e3395. https://doi.org/10.1371/journal.pone.0003395

Daetwyler, H. D., R. Pong-Wong, B. Villanueva, and J. A. Woolliams,2010 The impact of genetic architecture on genome-wideevaluation methods. Genetics 185: 1021–1031. https://doi.org/10.1534/genetics.110.116855

Daetwyler, H. D., K. E. Kemper, J. H. J. van der Werf, and B. J.Hayes, 2012 Components of the accuracy of genomic predic-tion in a multi-breed sheep population1. J. Anim. Sci. 90: 3375–3384. https://doi.org/10.2527/jas.2011-4557

Daetwyler, H. D., U. K. Bansal, H. S. Bariana, M. J. Hayden, and B.J. Hayes, 2014 Genomic prediction for rust resistance in di-verse wheat landraces. Theor. Appl. Genet. 127: 1795–1803.https://doi.org/10.1007/s00122-014-2341-8

de los Campos, G., A. I. Vazquez, R. Fernando, Y. C. Klimentidis,and D. Sorensen, 2013 Prediction of complex human traitsusing the genomic best linear unbiased predictor. PLoSGenet. 9: e1003608. https://doi.org/10.1371/journal.pgen.1003608

Dekkers, J. C. M., 2007 Marker-assisted selection for commercialcrossbred performance. J. Anim. Sci. 85: 2104–2114. https://doi.org/10.2527/jas.2006-683

Dreisigacker, S., P. Zhang, M. L. Warburton, B. Skovmand, D.Hoisington et al., 2005 Genetic diversity among and withinCIMMYT wheat landrace accessions investigated with SSRsand implications for plant genetic resources management. CropSci. 45: 653–661. https://doi.org/10.2135/cropsci2005.0653

Falconer, D. S., and T. F. C. Mackay, 1996 Introduction to Quan-titative Genetics, Ed. 4. Pearson, London.

Frascaroli, E., T. A. Schrag, and A. E. Melchinger, 2013 Geneticdiversity analysis of elite European maize (Zea mays L.) inbredlines using AFLP, SSR, and SNP markers reveals ascertainmentbias for a subset of SNPs. Theor. Appl. Genet. 126: 133–141.https://doi.org/10.1007/s00122-012-1968-6

Ganal, M. W., G. Durstewitz, A. Polley, A. Bérard, E. S. Buckleret al., 2011 A large maize (Zea mays L.) SNP genotyping ar-ray: development and germplasm genotyping, and genetic map-ping to compare with the B73 reference genome. PLoS One 6:e28334. https://doi.org/10.1371/journal.pone.0028334

Gorjanc, G., J. Jenko, S. J. Hearne, and J. M. Hickey, 2016 Initiatingmaize pre-breeding programs using genomic selection to harnesspolygenic variation from landrace populations. BMC Genomics17: 30. https://doi.org/10.1186/s12864-015-2345-z

Greene, S. L., T. J. Kisha, L.-X. Yu, and M. Parra-Quijano,2014 Conserving plants in gene banks and nature: investigat-ing complementarity with Trifolium thompsonii Morton.PLoS One 9: e105145. https://doi.org/10.1371/journal.pone.0105145

Habier, D., R. L. Fernando, and J. C. M. Dekkers, 2007 The impactof genetic relationship information on genome-assisted breedingvalues. Genetics 177: 2389–2397.

Habier, D., R. L. Fernando, and D. J. Garrick, 2013 Genomic BLUPdecoded: a look into the black box of genomic prediction. Ge-netics 194: 597–607. https://doi.org/10.1534/genetics.113.152207

Han, S., T. Miedaner, H. F. Utz, W. Schipprack, T. A. Schrag et al.,2018 Genomic prediction and GWAS of Gibberella ear rot re-sistance traits in dent and flint lines of a public maize breedingprogram. Euphytica 214: 6. https://doi.org/10.1007/s10681-017-2090-2

Hayes, B. J., P. J. Bowman, A. C. Chamberlain, K. Verbyla, and M.E. Goddard, 2009 Accuracy of genomic breeding values inmulti-breed dairy cattle populations. Genet. Sel. Evol. 41: 51.https://doi.org/10.1186/1297-9686-41-51

Henderson, C. R., 1985 Best linear unbiased prediction of non-additive genetic merits in noninbred populations. J. Anim. Sci.60: 111–117. https://doi.org/10.2527/jas1985.601111x

Hickey, J. M., T. Chiurugwi, I. Mackay, W. Powell, J. M. Hickeyet al., 2017 Genomic prediction unifies animal and plantbreeding programs to form platforms for biological discovery.Nat. Genet. 49: 1297–1303. https://doi.org/10.1038/ng.3920

Hill, W. G., and A. Robertson, 1968 Linkage disequilibrium infinite populations. Theor. Appl. Genet. 38: 226–231. https://doi.org/10.1007/BF01245622

Iheshiulor, O. O. M., J. A. Woolliams, X. Yu, R. Wellmann, and T.H. E. Meuwissen, 2016 Within- and across-breed genomic pre-diction using whole-genome sequence and single nucleotidepolymorphism panels. Genet. Sel. Evol. 48: 15. https://doi.org/10.1186/s12711-016-0193-1

Jannink, J.-L., A. J. Lorenz, and H. Iwata, 2010 Genomic selectionin plant breeding: from theory to practice. Brief. Funct. Geno-mics 9: 166–177. https://doi.org/10.1093/bfgp/elq001

Jiang, Y., and J. C. Reif, 2015 Modeling epistasis in genomic se-lection. Genetics 201: 759–768. https://doi.org/10.1534/ge-netics.115.177907

Kachman, S. D., M. L. Spangler, G. L. Bennett, K. J. Hanford, L. A.Kuehn et al., 2013 Comparison of molecular breeding valuesbased on within- and across-breed training in beef cattle. Genet.Sel. Evol. 45: 30. https://doi.org/10.1186/1297-9686-45-30

Kadam, D. C., S. M. Potts, M. O. Bohn, A. E. Lipka, and A. J. Lorenz,2016 Genomic prediction of single crosses in the early stagesof a maize hybrid breeding pipeline. G3(Bethesda) 6: 3443–3453 [corrigenda: G3 (Bethesda) 7: 3557–3558 (2017)].

Lehermeier, C., N. Krämer, E. Bauer, C. Bauland, C. Camisan et al.,2014 Usefulness of multiparental populations of maize (Zeamays L.) for genome-based prediction. Genetics 198: 3–16.https://doi.org/10.1534/genetics.114.161943

Lehermeier, C., C.-C. Schon, and G. de los Campos,2015 Assessment of genetic heterogeneity in structuredplant populations using multivariate whole-genome regres-sion models. Genetics 201: 323–337. https://doi.org/10.1534/genetics.115.177394

Lian, L., A. Jacobson, S. Zhong, and R. Bernardo,2014 Genomewide prediction accuracy within 969 maizebiparental populations. Crop Sci. 54: 1514. https://doi.org/10.2135/cropsci2013.12.0856

Martini, J. W. R., N. Gao, D. F. Cardoso, V. Wimmer, M. Erbe et al.,2017 Genomic prediction with epistasis models: on themarker-coding-dependent performance of the extended GBLUPand properties of the categorical epistasis model (CE). BMCBioinformatics 18: 3. https://doi.org/10.1186/s12859-016-1439-1

Mayer, M., S. Unterseer, E. Bauer, N. de Leon, B. Ordas et al.,2017 Is there an optimum level of diversity in utilization ofgenetic resources? Theor. Appl. Genet. 130: 2283–2295.https://doi.org/10.1007/s00122-017-2959-4

Melchinger, A. E., P. Schopp, D. Müller, T. A. Schrag, E. Bauer et al.,2017 Safeguarding our genetic resources with libraries of

Genomic Prediction in Landraces 1195

Page 12: Genomic Prediction Within and Among Doubled-Haploid ... · | GENOMIC PREDICTION Genomic Prediction Within and Among Doubled-Haploid Libraries from Maize Landraces Pedro C. Brauner,*

doubled-haploid lines. Genetics 206: 1611–1619. https://doi.org/10.1534/genetics.115.186205

Messmer, M. M., A. E. Melchinger, J. Boppenmaier, E. Brunklaus-Jung, and R. G. Herrmann, 1992 Relationships among earlyEuropean maize inbreds: I. genetic diversity among flint anddent lines revealed by RFLPs. Crop Sci. 32: 1301. https://doi.org/10.2135/cropsci1992.0011183X003200060001x

Meuwissen, T. H. E., B. J. Hayes, and M. E. Goddard,2001 Prediction of total genetic value using genome-widedense marker maps. Genetics 157: 1819–1829.

Monteiro, F., P. Vidigal, A. B. Barros, A. Monteiro, H. R. Oliveiraet al., 2016 Genetic distinctiveness of rye in situ accessionsfrom Portugal unveils a new hotspot of unexplored genetic re-sources. Front. Plant Sci. 7: 1–17. https://doi.org/10.3389/fpls.2016.01334

Müller, D., P. Schopp, and A. E. Melchinger, 2017 Persistency ofprediction accuracy and genetic gain in synthetic populationsunder recurrent genomic selection. G3 (Bethesda) 7: 801–811.https://doi.org/10.1534/g3.116.036582

Paradis, E., J. Claude, and K. Strimmer, 2004 APE: analyses ofphylogenetics and evolution in r language. Bioinformatics 20:289–290. https://doi.org/10.1093/bioinformatics/btg412

Poehlman, J. M., 1987 Breeding Field Crops. AVI publishing Co.,Westport, CT. https://doi.org/10.1007/978-94-015-7271-2

Pryce, J. E., B. Gredler, S. Bolormaa, P. J. Bowman, C. Egger-Dan-ner et al., 2011 Short communication: genomic selection usinga multi-breed, across-country reference population. J. Dairy Sci.94: 2625–2630. https://doi.org/10.3168/jds.2010-3719

R Core Team, 2017 R: A Language and Environment for StatisticalComputing. R Foundation for Statistical Computing, Vienna.

Reif, J. C., S. Hamrit, M. Heckenberger, W. Schipprack, H. P.Maurer et al., 2005a Genetic structure and diversity of Euro-pean flint maize populations determined with SSR analyses ofindividuals and bulks. Theor. Appl. Genet. 111: 906–913.https://doi.org/10.1007/s00122-005-0016-1

Reif, J. C., A. E. Melchinger, and M. Frisch, 2005b Genetical andmathematical properties of similarity and dissimilarity coeffi-cients applied in plant breeding and seed bank management.Crop Sci. 45: 1. https://doi.org/10.2135/cropsci2005.0001

Riedelsheimer, C., and A. E. Melchinger, 2013 Optimizing theallocation of resources for genomic selection in one breedingcycle. Theor. Appl. Genet. 126: 2835–2848. https://doi.org/10.1007/s00122-013-2175-9

Riedelsheimer, C., F. Technow, and A. E. Melchinger,2012 Comparison of whole-genome prediction models fortraits with contrasting genetic architecture in a diversity panelof maize inbred lines. BMC Genomics 13: 452. https://doi.org/10.1186/1471-2164-13-452

Riedelsheimer, C., J. B. Endelman, M. Stange, M. E. Sorrells, J.-L.Jannink et al., 2013 Genomic predictability of interconnectedbiparental maize populations. Genetics 194: 493–503. https://doi.org/10.1534/genetics.113.150227

Salhuana, W., and L. Pollak, 2006 Latin American maize project(LAMP) and germplasm enhancement of maize (GEM) project:generating useful breeding germplasm. Maydica 51: 339–355.

Schopp, P., D. Müller, F. Technow, and A. E. Melchinger,2017a Accuracy of genomic prediction in synthetic popula-tions depending on the number of parents, relatedness, and

ancestral linkage disequilibrium. Genetics 205: 441–454.https://doi.org/10.1534/genetics.116.193243

Schopp, P., D. Müller, Y. C. J. Wientjes, and A. E. Melchinger,2017b Genomic prediction within and across biparental fami-lies: means and variances of prediction accuracy and usefulnessof deterministic equations. G3 (Bethesda) 7: 3571–3586.https://doi.org/10.1534/g3.117.300076

Schulz-Streeck, T., J. O. Ogutu, Z. Karaman, C. Knaak, and H. P.Piepho, 2012 Genomic selection using multiple populations.Crop Sci. 52: 2453–2461. https://doi.org/10.2135/cropsci2012.03.0160

Stadler, L. J., 1944 Gamete selection in corn breeding. J. Am. Soc.Agron. 36: 988–989.

Strigens, A., W. Schipprack, J. C. Reif, and A. E. Melchinger,2013 Unlocking the genetic diversity of maize landraces withdoubled haploids opens new avenues for breeding. PLoS One 8:e57234. https://doi.org/10.1371/journal.pone.0057234

Technow, F., A. Bürger, and A. E. Melchinger, 2013 Genomic pre-diction of northern corn leaf blight resistance in maize withcombined or separated training sets for heterotic groups. G3(Bethesda) 3: 197–203.

Toosi, A., R. L. Fernando, and J. C. M. Dekkers, 2010 Genomicselection in admixed and crossbred populations. J. Anim. Sci.88: 32–46. https://doi.org/10.2527/jas.2009-1975

VanRaden, P. M., 2008 Efficient methods to compute genomicpredictions. J. Dairy Sci. 91: 4414–4423. https://doi.org/10.3168/jds.2007-0980

Warburton, M. L., J. R. Reif, M. Frisch, M. Bohn, C. Bedoya et al.,2008 Genetic diversity in CIMMYT nontemperate maize germ-plasm: landraces, open pollinated varieties, and inbred lines.Crop Sci. 48: 617. https://doi.org/10.2135/cropsci2007.02.0103

Westhues, M., T. A. Schrag, C. Heuer, G. Thaller, H. F. Utz et al.,2017 Omics-based hybrid prediction in maize. Theor. Appl.Genet. 130: 1927–1939. https://doi.org/10.1007/s00122-017-2934-0

Wientjes, Y., R. F. Veerkamp, P. Bijma, H. Bovenhuis, C. Schrootenet al., 2015 Empirical and deterministic accuracies of across-population genomic prediction. Genet. Sel. Evol. 47: 5. https://doi.org/10.1186/s12711-014-0086-0

Wientjes, Y. C. J., R. F. Veerkamp, and M. P. L. Calus, 2013 Theeffect of linkage disequilibrium and family relationships onthe reliability of genomic prediction. Genetics 193: 621–631.https://doi.org/10.1534/genetics.112.146290

Wilde, K., H. Burger, V. Prigge, T. Presterl, W. Schmidt et al.,2010 Testcross performance of doubled-haploid lines devel-oped from European flint maize landraces. Plant Breed. 129:181–185. https://doi.org/10.1111/j.1439-0523.2009.01677.x

Würschum, T., H. P. Maurer, S. Weissmann, V. Hahn, and W. L.Leiser, 2017 Accuracy of within- and among-family genomicprediction in triticale. Plant Breed. 136: 230–236. https://doi.org/10.1111/pbr.12465

Yu, X., X. Li, T. Guo, C. Zhu, Y. Wu et al., 2016 Genomic pre-diction contributing to a promising global strategy to turbo-charge gene banks. Nat. Plants 2: 16150. https://doi.org/10.1038/nplants.2016.150

Communicating editor: F. van Eeuwijk

1196 P. C. Brauner et al.


Recommended