+ All Categories
Home > Documents > Mark E. Sorrells & Elliot Heffner Department of Plant Breeding & Genetics Association Breeding...

Mark E. Sorrells & Elliot Heffner Department of Plant Breeding & Genetics Association Breeding...

Date post: 25-Dec-2015
Category:
Upload: sydney-page
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
37
Mark E. Sorrells & Elliot Mark E. Sorrells & Elliot Heffner Heffner Department of Plant Breeding & Department of Plant Breeding & Genetics Genetics Association Breeding Strategies for Crop Improvement
Transcript

Mark E. Sorrells & Elliot HeffnerMark E. Sorrells & Elliot HeffnerDepartment of Plant Breeding & GeneticsDepartment of Plant Breeding & Genetics

Association Breeding Strategies for Crop Improvement

Presentation OverviewPresentation Overview

Molecular Plant Breeding Strategies

• Marker Assisted Selection

• Association Breeding

• Genomic (Genome-Wide) Selection

Methods, examples , applications

Historical Improvement of Breeding MethodsHistorical Improvement of Breeding Methods

• Mass Selection

• Family Selection Methods

• Progeny Testing

• Marker Assisted Selection

• Genomic Selection

Molecular Breeding GoalsMolecular Breeding Goals

• Allele discovery

• Allele characterization & validation

• Parental & progeny selection for superior

alleles, transgressive segregation

Strategies for Molecular BreedingStrategies for Molecular Breeding

• Genomic Selection (Meuwissen, Hayes & Goddard 2001)

• Requires genome-wide markers that are used to develop a prediction model for estimating a breeding value for each individual

• Marker/QTL effects are estimated for individuals in a breeding population without phenotyping

• Marker Assisted Selection• Only significant markers are used for selection, usually qualitative traits

• Association Breeding (Breseghello & Sorrells 2006)

• Uses conventional hybridization/MAS/Testing for significant markers

• Allows for updating breeding values for new and existing alleles

• Phenotyping and association analysis are used as often as necessary for allele discovery and validation

Marker Assisted SelectionMarker Assisted SelectionSuccesses:

Significant impacts in backcrossing• Simple, monogenic trait improvement

• i.e. BC major genes into elite varieties

Limitations:

• Best suited for major genes• BC is the most conservative breeding method

• Pyramiding limited to a few target genes

Genes with small effects that underlie most of the Genes with small effects that underlie most of the important traits determine the success of new varieties important traits determine the success of new varieties

Association Mapping versus Bi-Parental QTL MappingAssociation Mapping versus Bi-Parental QTL Mapping

Association Mapping can be conducted relevant adapted groups of accessions

• Direct inference to a breeding population is possible

• Relevant genetic background effects are sampled

• Phenotypic variation is observed for most traits of interest

• Marker polymorphism higher than for biparental populations

• Routine variety trial evaluations provide high quality phenotypic data

• Characterize the structure of genetic variation in relevant populations

Novel alleles can be identified and their relative value can be assessed Novel alleles can be identified and their relative value can be assessed

as often as necessaryas often as necessary

Type I error (false positives) can be higher because of:

• Low heritability & small-effect QTL (heterogeneity of genetic background)

• Population structure

• Estimates of population structure or kinship are used in a linear mixed effects model to reduce the frequency of false positive associations

• High sampling variance of rare alleles

• Rare alleles are usually excluded from the analysis

Association Mapping versus QTL MappingAssociation Mapping versus QTL Mapping

Association Analysis: An ExampleAssociation Analysis: An ExampleBreseghello & Sorrells 2006-Genetics & Field Crops Research 2007Breseghello & Sorrells 2006-Genetics & Field Crops Research 2007

• Association Panel of Elite Soft Winter Wheat Varieties:

• 149 adapted soft wheat varieties; milling quality, seed size

• Markers:

• Preliminary screen: 18 unlinked SSRs

• 93 markers saturating two QTL regions

• Population Structure: TASSEL software - www.maizegenetics.net

• Structure without admixture

• Kinship - SPAGeDi (Hardy & Vekemans)

• Association Analysis:

• Linear mixed-effects model

• Markers were fixed effects from selected QTL regions

• Subpopulations or Kinship were random effects

Linkage Disequilibrium: Germplasm Linkage Disequilibrium: Germplasm SelectionSelectionBreseghello & Sorrells 2006 GeneticsBreseghello & Sorrells 2006 Genetics

• 149 lines genotyped with 18 unlinked SSR markers-95 selected

• Most similar lines were excluded

p<.0001

p<.001

p<.01

149 lines

95 line

s

R2 probability for unlinked SSR markers

Elite Soft Winter Wheat Varieties: Milling quality, seed size

"Normalizing" the sample reduced:

• population sub structure,

• frequency of rare alleles

• long range LD

Previous QTL Information:Previous QTL Information:Kernel Size and ShapeKernel Size and Shape

• Recombinant Inbred Wheat Population:

Synthetic W7984 x Opata (ITMI

population)

• QTL for kernel size on 5A

Size

5A

Width

2D

Breseghello & Sorrells 2006

• Doubled-Haploid Wheat Population:

AC Reed x Grandin

• QTL for kernel size (width) near Xwmc18-2D

Chromosome 2D: Associations & LD Estimate

Significant LD was below 1cM

Association analysis confirmed the kernel width QTL & identified other QTL

Significant LD Extended 3-5cM

Chromosome 5A: Associations & LD Estimate

Association analysis confirmed the kernel weight QTL

Estimated allele effectsEstimated allele effectsKernel WeightKernel Weight

N. of Cultivars: 41 45 43 49

Best Linear Unbiased Estimates - Allele effects (REML) were all compared to mean null alleles (missing & rare alleles)

Estimated allele effectsEstimated allele effectsKernel WidthKernel Width

No. of Cultivars: 41 14 8 15 18 24 5 10 19

Best Linear Unbiased Estimates - Allele effects (REML) were all compared to mean null alleles (missing & rare alleles)

Germplasm

New Populations

Evaluation of EliteEvaluation of EliteSynthetics, Lines, VarietiesSynthetics, Lines, Varieties

Evaluation Trials

Genotypic & Phenotypic data

Association Mapping: Characterize QTL/Marker

Allele Associations

Application of Association Analysis in a Breeding ProgramApplication of Association Analysis in a Breeding Program

Elite germplasmfeeds back intohybridization

nursery

• MAS identifies desired segregates up front so phenotypic selection MAS identifies desired segregates up front so phenotypic selection intensity can intensity can be increased for other traitsbe increased for other traits• Association mapping facilitates allele discovery and Association mapping facilitates allele discovery and validationvalidation

Marker Assisted SelectionMarker Assisted Selection

Parental Selection Hybridization

Association Analysis as a Breeding StrategyAssociation Analysis as a Breeding Strategy

Issues:• Breeding programs are dynamic, complex genetic entities that

require frequent evaluation of marker / QTL relationships.

• Accurate detection and estimation of QTL effects required

• In new germplasm, pre-existing marker alleles may be linked

to undesirable QTL alleles instead of the target allele

• Population structure can cause a high frequency of false

positive associations between markers and QTL

Genomic Selection MethodologyGenomic Selection Methodology Meuwissen et al. 2001 Genetics 157:1819-1829; Goddard & Hayes 2007

• In a Breeding Population individuals are genotyped but not phenotyped

• A genomic estimated breeding value (GEBV) for each individual is A genomic estimated breeding value (GEBV) for each individual is obtained by summing the marker effects for that genotype obtained by summing the marker effects for that genotype

• Prediction model is used to impose multiple generations of selection

• A Training Population is genotyped with a large number of markers and phenotyped for important traits

• Genome-wide markers are used to estimate all genetic effects simultaneously

• One or more markers are assumed to be in LD with each QTL affecting the trait

• Prediction model attempts to captures the total additive genetic variance

Test varieties and

release

Phenotype (lines have

already been genotyped)

Train prediction

model

Make crossesand advance generations

Genotype

Advance lines informative for

model improvement

New Germplasm

Line Development

Cycle

Genomic Selection

Advance lines with highest

GEBV

Model Training Cycle

GS in a Plant Breeding ProgramGS in a Plant Breeding ProgramHeffner, Sorrells & Jannink. Crop Science 49:1-12Heffner, Sorrells & Jannink. Crop Science 49:1-12

Genomic selection reduces cycle time & cost by reducing frequency of phenotypingGenomic selection reduces cycle time & cost by reducing frequency of phenotyping

Training PopulationTraining Population Breeding PopulationBreeding Population

Choosing a Statistical Model for GSChoosing a Statistical Model for GS

• Model performance is based on correlation between GEBV and TBV

• Must estimate many QTL effects from a limited number of phenotypes• Least Squares regression sets an arbitrary threshold for significance resulting in

overestimation of significant effects and loss of small effects

• Variable selection or shrinkage estimation can be used to deal with oversaturated regression models

• Many QTL effects can be estimated simultaneously in linear mixed models for the prediction of random effects

Choosing a Statistical Model for GSChoosing a Statistical Model for GS• Shrinkage Analysis

• Ridge Regression BLUP

• All effects are estimated simultaneously

• Assumes equal variance for all QTL effects

• Shrinks large QTL effects towards zero

• Bayesian Shrinkage Regression - a.k.a. BayesA, B (Meuwissen et al)

• Scaled inverse - Chi-square distribution

• Variance is estimated for each marker

• Bayes B assigns the value of zero to a portion of the markers

• Bayesian Variable Selection:

• Stochastic Search of Variable Selection

• Variance is estimated for each marker

• Both Shrinkage & Variable Selection

• Least Absolute Shrinkage and Selection Operator (LASSO) (Tibshirani; Xu)

• Minimizes residual sum of squares constraining sum of regression coefficients

• Model - free methods

• Kernel regression & Reproducing Kernel Hilbert Spaces regression (Gianola et al)

Factors Affecting the Accuracy of GEBVsFactors Affecting the Accuracy of GEBVs

• Level and distribution of LD between markers and QTL

• R2 > 0.2 desirable; haplotypes may increase LD but reduce power

• Size of training population

• Larger is better but over time re-training models may be required

• Heritability of the trait

• More records are required for low heritability traits

• Distribution of QTL effects

• Many small effect QTL or low LD favor BLUP for capturing small

effect QTL that may not be in LD with a marker

Genomic Selection in Dairy CattleGenomic Selection in Dairy CattleHayes et al. 2009Hayes et al. 2009

Comparisons of GEBV Reliabilities = square of correlation between GEBV & TBV

All included a polygenic effect (parental average BV) in calculating GEBV

• Australia - 798 Holstein-Friesian bulls• Australian Selection Index = sire 38% < BLUP 44% < BayesA 48%

• New Zealand - 4,500 bulls 4,500 bulls • Genomic estimated breeding values were 50-67% for milk vs 34% for

parental average

• United States - 3,576 Holstein bulls 3,576 Holstein bulls • Genomic estimated breeding values were 50% for selection index vs

27% for parental average

• The Netherlands - 1,583 Holstein bulls 1,583 Holstein bulls • Genomic estimated breeding values were 9 to 33% higher than

parental average

Adapting Genomic Selection to Plant BreedingAdapting Genomic Selection to Plant Breeding

• For most crop species, large populations can be generated

• For animals, many daughters are tested for each bullFor animals, many daughters are tested for each bull

• Plant breeders use more diverse mating schemes

• Animal Parental values are mainly based on half-sib families

• Inbred lines, testcross hybrids and clonally propagated crops can be replicated in time and space

• Each animal is a unique genotype and heterozygousEach animal is a unique genotype and heterozygous

• GxE is a major issue in plant breedingGxE is a major issue in plant breeding

• LD in self-pollinated crops tends to be quite high = 5-20 cM for r2 of 0.1 to 0.2

Genomic Selection & Marker Assisted Recurrent Selection Genomic Selection & Marker Assisted Recurrent Selection Schemes for Maize Inbred Development - An ExampleSchemes for Maize Inbred Development - An Example

Bernardo & Yu 2008

Computer simulation to compare Genomic Selection to Marker Assisted Recurrent Selection

Genomic Selection: • A large number of markers are used to estimate breeding value. • Trait values are the sum of an individual’s breeding values across all markers.

MARS: Only significant markers for target traits are used for selection

Simulations:• Number of QTL - 20, 40, & 100• Heritabilities - 0.2, 0.5, 0.8

Off-season nurseries

Training Population to develop prediction equations

Computer Simulations:QTL - 20, 40, & 100H2 - 0.2, 0.5, 0.8

Genomic Selection: • DH testcrosses are training population. Phenotyped & genotyped to train model.

• MARS uses only significant markers.

• Two cycles of selection

Bernardo & Yu 2008

Genomic Selection & Marker Assisted Recurrent Selection Schemes for Genomic Selection & Marker Assisted Recurrent Selection Schemes for Maize Inbred DevelopmentMaize Inbred Development

Bernardo & Yu 2008

Results of simulations:Response to genomic selection was 18-43% higher than MARS across Response to genomic selection was 18-43% higher than MARS across different population sizes, numbers of QTL and heritabilities.different population sizes, numbers of QTL and heritabilities.

Advantage of GS over MARS was greatest for low hAdvantage of GS over MARS was greatest for low h22 and many QTL. and many QTL.

% Advantage of GS over MARS

#QTL Heritability 0.2 0.4 0.8 20 130 121 118 40 136 132 135 100 143 128 130

Parent RecombinationParent Recombination

F1 Greenhouse Advance F1 Greenhouse Advance

Advanced Regional TestingAdvanced Regional Testing

7 years to parent

selection and

advanced testing

7 years to parent

selection and

advanced testing

Phenotypic + MAS SelectionPhenotypic + MAS Selection

Pedigree + PhenotypePedigree + Phenotype

F4 GH Advance SSD F4 GH Advance SSD

F5 GH Advance SSD F5 GH Advance SSD

F5DL Field Single Plot Yield Trial; PS

F5DL Field Single Plot Yield Trial; PS

F5DL Field Yield Trials 3 Locations; PS

F5DL Field Yield Trials 3 Locations; PS

F2 GH Adv. SSD ; Geno; MAS 1 F2 GH Adv. SSD ; Geno; MAS 1

F3 GH Adv. SSD ; Geno; MAS 2F3 GH Adv. SSD ; Geno; MAS 2

Self-Pollinated Crop Genomic Selection vs. Phenotypic/MAS Selection Timeline

F5DL Field Single Row Seed Increase ; PS

F5DL Field Single Row Seed Increase ; PS

F5DL Field Yield Trials 3 Locations; PS

F5DL Field Yield Trials 3 Locations; PS

GEBV + PhenotypeGEBV + Phenotype

F2 GH Advance SSD F2 GH Advance SSD

F3 GH Advance SSDF3 GH Advance SSD

3 years to parent

selection

3 years to parent

selection

Genomic SelectionGenomic Selection

5 years to advanced

testing

5 years to advanced

testing

F5DL Field Single Row Seed Increase; GS+PSF5DL Field Single Row Seed Increase; GS+PS

F5DL Field Trials3 Locations ; GS+PS

F5DL Field Trials3 Locations ; GS+PS

F4 GH Advance SSD F4 GH Advance SSD

F5 GH Adv. SSD; Genotype; GSF5 GH Adv. SSD; Genotype; GS

Genomic Selection Experiments in the Genomic Selection Experiments in the Cornell Wheat Breeding ProgramCornell Wheat Breeding Program

• Within families - Cayuga x Caledonia DH population• Pre-harvest Sprouting • Heritability = 0.44 • 209 lines across 16 environments (6 years)• 15 QTL explaining < 40% of the phenotypic variance

• Across Families - Master Nursery• 400 advanced breeding lines (F7+) • Augmented field design • Three locations over 3 years• DArT markers ~ 1500 polymorphisms

Cornell Winter Wheat Breeding ProgramCornell Winter Wheat Breeding Program

2 or 3 way cross of parent material F1

F3-F4: Early Generation bulk (mass selection for Ht & seed size)

F5: Space Plant select individual

plants

F6: Head row (1m) and single row selection

F7: Screening Nursery (3m plots) Prelim Line

Selection

F8-F10: Master Nursery F8-F10: Master Nursery 400 lines; 4 meter plots; 400 lines; 4 meter plots; Advanced line selection Advanced line selection

Regional Trials (1-4 years) and Variety release

F2 (MAS for 1-5 loci)

Seed Increase

Final Screening (3m plots)

F3-F4: Early Generation Genomic Selection

F5: Space Plant select individual

plants

Preliminary Evaluation of GS for Preharvest Preliminary Evaluation of GS for Preharvest Sprouting (PHS) in Cayuga x CaledoniaSprouting (PHS) in Cayuga x Caledonia

Collaboration with Hiroyoshi IwataCollaboration with Hiroyoshi Iwata

• Population size = 209; 16 Environments; Heritability Population size = 209; 16 Environments; Heritability = 0.44= 0.44• LOO cross validation: Leave a line out of the analysis, then LOO cross validation: Leave a line out of the analysis, then

predict it using marker data; repeat for all lines. Provides an predict it using marker data; repeat for all lines. Provides an estimate of selection based only on markersestimate of selection based only on markers

• Models: RR-BLUP, Bayes A & B; GEBV Models: RR-BLUP, Bayes A & B; GEBV with and with and withoutwithout the phenotype of the predicted line in the the phenotype of the predicted line in the model training stepmodel training step•Average Correlation Average Correlation

between phenotype in between phenotype in training population and training population and True Breeding Value = True Breeding Value = 0.670.67

GS w/o Pheno vs. True Breeding Value (TV) GS w/o Pheno vs. True Breeding Value (TV)

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.5 0.55 0.6 0.65 0.7 0.75 0.8

Observation

Pre

de

cti

on

RR

BayesA

BayesB

RRPre:TV BayesAPre:TV BayesBPre:TV Ave Pre:TV

0.629 0.628 0.587 0.634

0.40 0.39 0.35 0.40

Corr to TV

R^2

Pre

dic

tion

r(GEBV accuracy) = 0.67

h2 = 0.44

R(gain) = irσA

GS with Phenotype vs. True Breeding Value GS with Phenotype vs. True Breeding Value

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.5 0.55 0.6 0.65 0.7 0.75 0.8

Observed:TV

GS

_A

LL

: T

V

Correlation to TBV = 0.73R2 = 0.53

Preliminary ConclusionsPreliminary Conclusions GS with Markers + Phenotype > Phenotype > GS with MarkersGS with Markers + Phenotype > Phenotype > GS with Markers

RR RR computationally >40+ times fastercomputationally >40+ times faster

Only 209 genotypes produced GEBVs that are comparable to Only 209 genotypes produced GEBVs that are comparable to phenotypic selectionphenotypic selection

GEBVs w/ phenotype have better precision than phenotype alone GEBVs w/ phenotype have better precision than phenotype alone - implications for advanced testing- implications for advanced testing

Summary:Summary:Association Breeding and Genomic SelectionAssociation Breeding and Genomic Selection

Association Breeding:

• New alleles can be identified and characterized to determine their value

• Allelic values of previously identified alleles can be dynamically updated based

on advanced trial data as desiredGenomic Selection:

• Captures small-effect QTL and genetic relationships

• Can increase gain from selection & reduce advanced testing

• Requires a large number of markers and accurate prediction models

Both Association Breeding & Genomic Selection:

• Genome saturation is not required (but does improve prediction) and supplemental

markers can focus on specific QTL regions and candidate genes

• The most important advantages are The most important advantages are reductions in the length of the selection cycle reductions in the length of the selection cycle

and the associated phenotyping cost resulting in greater gain per year.and the associated phenotyping cost resulting in greater gain per year.

Future OpportunitiesFuture Opportunities

• Develop economic assessment of GS in breeding strategies

• Prediction of Epistatic effects

• Develop prediction models for different target

environments & high value traits

• Predict utility of new germplasm from other sources

AcknowledgementsAcknowledgements• USDA Soft Wheat Quality Lab, Wooster, OH

• Embrapa

• USDA Cooperative State Research, Education and Extension Service, Coordinated Agricultural Project

• USDA National Needs Fellowship Grant 2005-38420-15785: Provided Fellowship for Elliot Heffner

Provided assistantship for Flavio Breseghello


Recommended