Early generation selection in a recurrent selection breeding program within a synthetic population
– Using genomewide markers to speed-up the process
Seminar on genomic selection 17/10/2014 Tuong-Vi Cao, UMR AGAP, CIRAD-BIOS
Genomic selection based on genome-wide genotype-phenotype relations is a promising approach for breeding :
1. to access more selection candidates (higher intensity of selection) and
2. to reduce the duration of selection cycles (maximize genetic gain/unit time)
This is even more interesting since molecular information is becoming more accessible while phenotypic information is becoming limiting, in terms of resources allocation.
The upland rice breeding program of CIAT initiated this approach and first results based on cross-validation within calibration population data) showed that it is possible to use such an approach but the accuracy is rather low globally. Some reasons are already stressed (only one year*location evaluation, only additive effects modelled).
My present contribution is about :
1. the way phenotypic predictor may be defined and modelled to take into account dominance and epistatic interactions, and
2. the way to integrate markers to reduce further the duration of selection cycles.
[Ms Ms]1/4
[Ms ms]1/2
[ms ms]1/4
392 S1:2 progenies segregating for ms gene
EEP 2010S0:1 progenies segregating :
¼ [Ms Ms] + ½ [Ms ms] + ¼ [ms ms]pl4
pl2pl3 pl1
EEP 2010Seed increase through SSD
S1:2
DNA extraction of 8 S1:2
plants and genotyping for ms locus
EEP 2011 A
EELL 2008Four synthetic populations segregating for ms gene :
½ [ms ms] + ½ [ms Ms] MS MF
PCT-4C PCT-11PCT-4AMS MF
PCT-4B
S0:1
Extraction of 100 S0:1 progenies per population
on MF plants
What has been done and what is the question ?
What has been done and what is the question ?
[Ms Ms]1/4
[Ms ms]1/2
[ms ms]1/4
392 S1:2 progenies segregating for ms gene
EEP 2010S0:1 progenies segregating :
¼ [Ms Ms] + ½ [Ms ms] + ¼ [ms ms]pl4
pl2pl3 pl1
EEP 2010Seed increase through SSD
S1:2
DNA extraction of 8 S1:2
plants and genotyping for ms locus
EEP 2011 A
EELL 2008Four synthetic populations segregating for ms gene :
½ [ms ms] + ½ [ms Ms] MS MF
PCT-4C PCT-11PCT-4AMS MF
PCT-4B
S0:1
Extraction of 100 S0:1 progenies per population
on MF plants
[Ms Ms]1/4
[Ms ms]1/2
[ms ms]1/4
392 S1:2 progenies segregating for ms gene
[Ms Ms]1/4
[Ms ms]1/2
[ms ms]1/4
392 S1:2 progenies segregating for ms genes
S2:3
PhenotypingS2:4Bulk seed increase
S2:3
S2:3
S2:3
DNA extraction of 15 S2:3 plants per progeny
What has been done and what is the question ?
GBS genotyping to infer the genotype of S2 plants
Phenotyping of S2:4 progenies to calibrate the model
Choice of one [Ms Ms] plant per S1:2 progeny to constitute the calibration population.
Bulk seed increase
What has been done and what is the question ?
• The S2 population as the base population structure for calibration is an option because a partially fixed material:– is more homogenous and easier to phenotype (minimum intra-
progeny variation and maximum between progeny variation) – minimizes the bias due to dominance effects.
• However, it is time and resources consuming :– to produce material to calibrate the prediction model (S2
population to be sampled, S2:3 bulks to be genotyped, S2:4 progenies to be phenotyped)
– to produce the breeding material until S2 generation before being predicted in each cycle.
• Hence, is it possible to save time & resources through :– Early phenotyping for calibrating the model ?– Early prediction of breeding candidates ?
Genetic model
• For simplicity, let us suppose two biallelic loci M and N,
• Let be a genotype in S0 generation,
• The genotypic value is
lj
ki
NM
NM
ijklDD
jklAD
iklAD
ijlAD
ijkAD
jkAA
ilAA
jlAA
ikAA
klD
ijD
lA
kA
jA
iA
ijklGS 0
Additive effects associated with alleles i or j of M locus and alleles k or l of N locus
Dominance effects associated with M and N loci respectively
Additive*additive epistasis associated with one allele of M locus and one allele of N locus
Additive*dominance epistasis associated with 2 alleles of first locus and 1 allele of second locus
Dominance*dominance epistasis associated with all alleles
Genetic model
• At meiosis, the genotype produces four gametes with frequencies depending on the recombination rate r,
• If selfed, the genotype produces ten genotypes in the S1 generation …
Gametes and their respective frequencies
kiNM 2
1 r liNM 2
r kjNM 2
r ljNM 2
1 r
Gam
etes
and
thei
r re
spec
tive
fr
eque
ncie
s
kiNM 2
1 r Giikk Giikl Gijkk Gijkl
liNM 2
r Giikl Giill Gijkl Gijll
kjNM 2
r Gijkk Gijkl Gjjkk Gjjkl
ljNM 2
1 r Gijkl Gijll Gjjkl Gjjll
Genotypic value / Genotype
Genetic model
• With respective frequencies shown below :
Genotype Frequency Giikk ¼ (1-r)² Gjjll ¼ (1-r)² Giill ¼ r² Gjjkk ¼ r² Gijkl ½ (1-r)² Gijkl ½ r² Giikl ½ r (1-r) Gijkk ½ r (1-r) Gijll ½ r (1-r) Gjjkl ½ r (1-r)
Non recombinant double homozygote genotypes
Recombinant double homozygote genotypes
Non recombinant double heterozygote genotypeRecombinant double heterozygote genotype
Partially recombinant genotypes, homozygote for one locus and heterozygote for the other locus
• The frequencies form a vector, V1, associated with the S1 generation :
¼ (1-r)² ¼ (1-r)²
¼ r² ¼ r²
½ (1-r)² ½ r²
½ r (1-r) ½ r (1-r) ½ r (1-r) ½ r (1-r)
V1=
If V2 is the vector of frequencies of the S2 generation, then one can find the relationship between V1 and V2 …
Genetic components of generation means
Genetic components of generation means• This relation is V2 = M*V1
• It holds for any couple of successive generations (Vn+1=M*Vn).
• M matrix is used to estimate genotypic values and genetic covariances between successive generations.
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
21
41
41
41
41
41
41
41
41
41
41
41
41
41
41
41
41
000)1()1(0000
000)1()1(0000
000)1()1(0000
000)1()1(0000
0000)²1(²0000
0000²)²1(0000
00)²1(²1000
00)²1(²0100
00²)²1(0010
00²)²1(0001
rrrr
rrrr
rrrr
rrrr
rr
rr
rr
rr
rr
rr
M
Ongoing questions : • Is it possible to relate the frequencies of any generation (including RILs) to the
ones of first generation directly (i.e. S0 plant or F1 cross)? • If yes, it is also possible to relate any generation mean and genetic covariance
the ones of unselfed S0 plant or F1 cross ?
Genetic components of generation means
• If successive generations are allowed to segregate and recombine until complete fixation (i.e. neither selection nor drift), the expected mean value of the RILs will be :
• Thus if r = ½ (for simplicity), the genotypic mean value of S1 progeny of a S0 plant/cross is :
ijklDD
jjklDD
ijllDD
ijkkDD
iiklDD
jjkkDD
iillDD
jjllDD
iikkDD
jklAD
iklAD
ijlAD
ijkAD
jkkAD
jjkAD
illAD
iilAD
jllAD
jjlAD
ikkAD
iikAD
jkAA
ilAA
jlAA
ikAA
llD
kkD
jjD
iiD
klD
ijD
lA
kA
jA
iA
ijklGS
4
1
8
1
16
12
1
4
1
4
14
1
2
11
jjkkDD
iillDD
jjllDD
iikkDD
jkkAD
jkkAD
illAD
iilAD
jllAD
jjlAD
ikkAD
iikAD
jkAA
ilAA
jlAA
ikAA
llD
kkD
jjD
iiD
lA
kA
jA
iA
ijklGS
4
12
1
2
12
1
ijklGS
lj
ki
NM
NM
Line value concept : definition and prediction
• Line value (LV) is the mean value of all RILs that a plant or a cross can produce through successive selfings (or haplo-diploïdisation).
• LV may be predicted by any couple of successive generations :
• If a F1 and its F2 self are both phenotyped, then [2*GF2-GF1] predicts the mean value of RILs derivable from the cross. The genetic components may be written as follows :
• This predictor equals the expected LV (S∞Gijkl) except for the DD
terms.
ijklG
ijklG
nn SS 1
*2
ijkljjklijllijkkiikljjkkiilljjlliikk
jkkjkkilliiljlljjlikkiik
jkiljlikllkkjjiilkjiFF
DDDDDDDDDDDDDDDDDD
ADADADADADADADAD
AAAAAAAADDDDAAAAGG
2
1
4
1
8
12
1
2
12
1*2 12
Line value concept : definition and prediction
• The difference in DD terms between the expected line value (S∞
Gijkl) and its prediction (2*GF2-GF1) :
–The prediction includes the quantity DD= which is associated with heterozygote structures.
–While the line value includes the quantity DD’= associated with homozygote structures.
This means that if DD=DD’=0, then the prediction of LV obtained from early generations will be exactly equal to the expected LV (S∞
Gijkl).
ijkljjklijllijkkiikl DDDDDDDDDD2
1
4
1
jjkkiilljjlliikk DDDDDDDD 8
1
Applying LV concept to RS breeding scheme : advantages & specifics aspects
• Efficient & early prediction of the potential of plants or crosses to produce performant inbred lines, even for traits with dominance and epistatic interactions.
• In the context of the CIAT rice breeding scheme, unique S0 plants can not be phenotyped properly, so successive selves can be used to construct the predictor of interest, which is [2 * S2Gijkl - S1Gijkl] or [2 * S3Gijkl - S2Gijkl], depending on the quantity of seeds needed for phenotyping (i.e. monolocal versus multilocal experimentation).
Applying LV concept to RS breeding scheme : advantages & specifics aspects
• Advantages of LV predictor compared with S2:4 predictor :– Gain in the duration of the calibration process (1 or 2 generations)– Gain in the duration of a selection cycle (prediction of S0:2
progenies instead of S2:4 progenies) – No bias due to dominance (as in single generation phenotyping)
• Specific aspects to focus on :– Bulk multiplication of seeds is mandatory (to maintain allelic
frequencies to be able to develop the equations)– The ms locus controlling male sterility is difficult to manage if
genotyping for the locus is not available to differentiate S0 plants– Number of progenies to be phenotyped is halved if equal
resources is considered (as two generations needed to be phenotyped)
Accelerating further the process using genomewide markers
• Line value may be used as phenotype in a genomic model instead of single selfed progeny value. The procedure consists in: – GBS Genotyping of S0 plants,
– Phenotyping of S1 and S2 (or S2 and S3) progenies,
• Gain at two levels compared with S2 genotyping and S2:4 phenotyping:– Calibration takes 2 generations (S1 and S2) or 3 generations (S2
and S3) instead of 4 generations
– Prediction takes place on S0 plants directly without multiplying until S2 generation
Accelerating further the process using genomewide markers
Procedure when genotyping of ms locus is available :– Genotyping of S0 plants for ms locus– GBS genotyping of S0 plants that cary [Ms Ms] genotype at ms
locus only– Seed increase of [Ms Ms] S0 plants until S2 or S3 generations– Phenotyping of S1 + S2 (or S2 + S3)
Conclusion
This procedure optimises the GS scheme for some aspects : • Calibration of the model based on very early generations• Early prediction of the breeding population (S0). This
maximizes the genetic gain par unit time. • Line value predictor are less unbiased by complex effects even
if these may be important in early generations, in particular dominance
Thank you !