Applying High-Throughput Genomics to Crops for the Developing World
Jason Wallace Cornell University
The big picture: Global food security
Photo credit: NASA
• Food security means reliable access to food of sufficient quality and quantity to lead an active and healthy life1
• 842 million people worldwide are food insecure2
• Increasing food security is one of the surest ways to improve health, educational attainment, and political stability
1 Paraphrased from FAO, Declaration of the World Summit on Food Security, 2009 2 FAO, The State of Food Insecurity in the World, 2013
Major constraints on food security
Environmental variability
Projected surface temperature change3
Negative side-effects
Erosion Pollution NOAA
Deforestation Rhett Butler
Changing consumption habits
Fat & oil Fish
Dairy Meat Fruits
Cereals Vegetables
1.0 2.0 3.0
Consumption (Billion tonnes/year) 2
1 UN Department of Economic and Social Affairs, World Population Prospects: The 2012 Revision. 3NOAA GFDL Climate Research Highlights Image Gallery 2Kearney 2010, Phil Trans Roy Soc B 365
Increasing population
4
Po
pu
lati
on
(b
illio
ns)
1
6
8
~9 billion by 2050
10
12
2
2010 2030 2050
Today
Reaching the goal Improved
crops Government
Policies
Agronomic Practices
Infrastructure development
Technology Development
Agroecology
Consumer habits
Market Incentives
Co
st/m
ega
bas
e
$1
$0.1
$10
$100
$1K
$10K
Year 2000 2005 2010 2015
The golden age of crop genetics
• Modern sequencing is opening the floodgates to genetic analysis
0
10
20
30
40
50
60
Ge
no
me
s seq
uen
ced
Total plant genomes sequenced2
Moore’s Law Cost of sequencing1
Sequencing trends over time
2 Michael & Jackson 2013, The Plant Genome 6 1 Wetterstrand KA. DNA Sequencing Costs, available at: www.genome.gov
Case studies outline Barnyard Millet
Diversity Analysis Pearl Millet
Genetic Map Creation Maize
Trait Mapping
Shramajeevi Agri Films
Case studies outline Barnyard Millet
Diversity Analysis Pearl Millet
Genetic Map Creation Maize
Trait Mapping
Shramajeevi Agri Films
Case Study 1: Barnyard millet diversity
Shramajeevi Agri Films
Barnyard Millet (Echinochloa spp.)
• Barnyard millet (Echinochloa spp.) is an important alternative crop in southern and eastern Asia
• Two species: E. colona (India) and E. crus-galli (Japan)
• Also grown as a forage crop in the US and Japan (“billion-dollar grass”)
• Goal: Characterize the newly created core collection at ICRISAT using genome-wide marker data
Genotyping-by-sequencing GBS • Created for high-throughput, semi-automated
genotyping
Sequencing adaptor Barcode
Sticky ends
Genomic DNA
Images: Qiagen, Illumina, Elshire et al 2011, PLoS ONE
Restriction digest
Sequence Ligate adaptors
Isolate DNA
Pool & amplify
Sample plants
• Advantages • One step SNP discovery + genotyping
• Simple protocol; no reference required
• Large numbers of SNPs found cheaply
• Broadly applicable
• Drawbacks • False SNPs from
sequencing errors
• Missing data from stochastic sampling
Cleaning up the data
• Have ~20,000 SNPs after basic filtering
• Problem: Both barnyard millet species are hexaploid -> false SNPs due to paralogs
Minor Allele Frequency
Re
lati
ve a
bu
nd
ance
Minor Allele Frequency
Re
lati
ve a
bu
nd
ance
Combined pop. E. colona E. crus-galli
Differentially segregating alleles
Filter by “heterozygosity”
Site Frequency Spectrum (raw) Site Frequency Spectrum (filtered)
Wallace et al. 2015, Plant Genome (in press)
Ideal
Paralogs
Phylogenetics
• Phylogeny splits the two species as expected
• Population structure within species closely matches phylogeny and geography
E. colona E. crus-galli
Potential hybrids
Wallace et al. 2015, Plant Genome (in press)
Outline Barnyard Millet
Diversity Analysis Pearl Millet
Genetic Map Creation Maize
Trait Mapping
Shramajeevi Agri Films
Genetic Maps for Pearl Millet • Staple crop for India and Sub-saharan Africa
• Large (2.3 GB), diverse genome
• Reference genome in process
Pearl Millet (Pennisetum glaucum)
• Goal: Assemble genetic maps to anchor scaffolds into pseudochromosomes
Mapping Populations • 3 biparental populations used for genetic mapping:
• 841 x 863 (“Patancheru”)
• ~ 100 RILs from ICRISAT-Patancheru
• Tift 99B x Tift 454 (“Tifton”)
• ~ 180 RILs from Som Punnuri, Ft. Valley State University, USA
• Wild x Domestic F2s (“Sadore”)
• ~ 300 F2 plants from Boubacar Kountche, ICRISAT-Niamey
Summary statistics
Comparison of Genotyping Depths
# ge
no
typ
es
(lo
g sc
ale
)
Call depth (= # reads)
100
102
104
106
108
SNP counts
0
20k
40k
60k
48k
75k 76k 80k
Fewer SNPs = less diversity
Tifton Patancheru Sadore
Best read depth
Making individual maps
1. Call SNPs
SNPs
1. Call SNPs
2. Group via hierarchical clustering
Making individual maps
1. Call SNPs
2. Group via hierarchical clustering
3. Merge linkage groups
Making individual maps
1. Call SNPs
2. Group via hierarchical clustering
3. Merge linkage groups
4. Order markers
Making individual maps
1. Call SNPs
2. Group via hierarchical clustering
3. Merge linkage groups
4. Order markers
5. Cleanup
Making individual maps
Merge maps for final assembly
• 4824 contigs assembled into 1.68 GB reference
• 92.8% of sequence data
• 60% have putative orientations
• Not perfect, but pretty good
Outline Barnyard Millet
Diversity Analysis Pearl Millet
Genetic Map Creation Maize
Trait Mapping
Shramajeevi Agri Films
Case Study 3: Trait Mapping in the CIMMYT WEMA Populations
• WEMA = Water-Efficient Maize for Africa
• ~20 biparental families, ~200 lines each
• Goal: Use data from across families to map trait loci with high resolution
3D PCA plot of the WEMA families
PC1 PC2
PC3
• Two approaches to mapping traits in WEMA
Trait mapping
Env 3 Env 4 Env 2 Env 1
Unified Posterior Probabilities
Bayesian GWAS Traditional Joint GWAS
merge
Both methods get similar results
Traditional GWAS (-log10 p-value)
Bayesian GWAS (cumulative Bayes factor)
• Mappings in both methods are roughly equivalent
Preliminary trait-mapping results
ZCN8
VGT1 ZmRAP2.7
? ?
GIGZ1A?
0 MB 100 MB 150 MB 50 MB
?
-lo
g10
p-v
alu
e
Association for Days to Anthesis (well-watered) on Chromosome 8
Conclusions
Photo credit: NASA
• Genomic technology can rapidly characterize almost any crop
• These genetic tools help breed crops faster and better
• Genotyping is basically solved; the bottlenecks are now phenotyping and selection
Future Need 1: High-throughput phenotyping
Photo credits: CIMMYT & Michael Gore
• Genotyping frequently cheaper than dirt (field space)
• Phenotyping is now the limiting factor
Manual recording Rapid phenotyping High-throughput phenotyping
Future Need 2: Data infrastructure
• Both genotyping and phenotyping threaten to drown us in data.
• Data is only useful if it is usable
• Need to develop solutions so genotypes, phenotypes, and germplasm are integrated and linked
SERVER FARM IMAGE
Torkild Retvedt
Make crosses
Phenotype
yi = m + Smzijujdj + ei
(Re)train model
Predict via model Genotype
Standard breeding cycle
Selection cycle (faster, less expensive)
Training cycle (slower, expensive)
Future Need 3: Faster breeding methods
Genomic Selection scheme
Acknowledgements
The Buckler Lab
Collaborators
• C. Tom Hash (ICRISAT-Niamey)
• Boubacar Kountche (ICRISAT-Niamey)
• Som Punnuri (Fort Valley State University)
• Hari Upadhyaya (ICRISAT-Patancheru)
• Rajeev Varshney (ICRISAT-Patancheru)
• Xin Liu (BGI)
• Xuecai Zhang (CIMMYT-Mexico)
• The Institute for Genomic Diversity (Cornell)
• The Maize Diversity Project
• The Pearl Millet Genome Sequencing Consortium
Funding
• National Science Foundation (NSF)
• Plant Genome Research Program
• Basic Research to Enable Agricultural Development (BREAD)
• The International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)
• The International Maize and Wheat Improvement Center (CIMMYT)
• The United States Agency for International Development (USAID)
• The United States Department of Agriculture Agricultural Research Service (USDA-ARS)