Post on 30-Mar-2018
transcript
Ontological Implications from Sequencing the Global Rice Collections
Kenneth L. McNallySr. Scientist II
A Team 4 Computational GenomicsT.T. Chang Genetic Resources CenterInternational Rice Research Institute
Los Baños, Laguna, Philippines
IRRI
IRRI - Int. Rice Research Institute• Founded in 1960 by Rockefeller and Ford Foundations
• Flagship center of CGIAR
• Green Revolution in Asia initiated through the release of IR8 with sd1 for semi-dwarf stature
Mission
• To reduce poverty and hunger, improve the health of rice farmers and consumers, and ensure environmental sustainability through collaborative research, partnerships, and the strengthening of national agricultural research and extension systems.
QuickTime™ and a
decompressor
are needed to see this picture.
IRGC – the International Rice Genebank Collection
World’s largest collection of rice germplasm held in trustfor the world community and source countries
• Over 108,000 registered and incoming accessions from 117 source countries
• Two cultivated species
Oryza sativaOryza glaberrima
• 22 wild species
• Relatively few accessions have donated alleles to current, high-yielding varieties
• http://www.irri.org/GRC
Allelic series of induced and natural variants
Traditional Germplasm
IRGC – 108,000 accessions
Elite Lines/VarietiesIR64 Mutant Collection
Breeding/Mapping
populations
Array of Genetic Resources
Develop a genetic diversity platform
Single genome
Nipponbare(Temp Japonica)
NGS/3GS
>10K lines from Gene Bank
20 varieties
genome-wide SNP
OryzaSNP
2000+ lines genome-wide SNP
Association genetics platform
SNP haplotype-phenotype association
QTL prediction
Parental choices
Pedigree/trait tracking
Natural reverse genetics system
Probe deep into available, useful diversity
Selective trait evaluation
Large SNP dataset to query new germplasm
Breeding history
Abundant markers
Benchmarkphysical map
Enabling -omics technology
“e-cloning”
2005 20112008 2012
Use
Conserved Germplasm
Breeding Lines
Specialized Genetic Stocks
Current
problems
Drought tolerance
conservation
dissemination
Phenotype-
genotype
association
Durable
disease-pest
resistance
Problem soils
Future
challenges
Public Genetic Diversity Research Platform
C4 Rice Grain quality
Global Rice Scientific Partnership (GRiSP)PL 1.2: Characterizing genetic diversity and
creating novel gene pools
• 1.2.1 Rice SNP Consortium for high density genotypes
http://www.ricesnp.org
Cornell, USDA, EMBRAPA, …
• 1.2.2 Global phenotyping network for key traits
• 1.2.3 Whole genome sequencing of genebank stocks
• 1.2.4 Specialized populations for genetic studies
IRRI
GW
AS
+++
Knowledge of genotype / phenotype / agronomic value
Nu
mb
er o
f li
ne
s
Genetic Resources: Genotype/Phenotype information
Detail required
to evaluate
agronomic performance
Minimal knowledge
of most materials
Solution:
Data on whole collection
to predict performance –
ecogeographic data;
sequence collection
10,000 GeneBank accessions1
Cultivated + close wild relatives
Rice SNP
Consortium
1M Affymetrix
genotyping chip
2000 lines
Phenotyping network
2000+ lines $20 M (combined funding sources)
Association genetics and
QTL mapping
Predict genotype-phenotype
relationships at kb resolution
Specialized genetic
stocks: MAGIC
populations, biparental
RILs, CSSL,
Genebank as a
reverse
genetics
system
Select
accessions
based on QTL
prediction for
targeted
phenotyping of
specific traits
Discover novel
phenotypes
Bioinformatics and database to
Integrate sequence-phenotype data
BGI de novo sequencing 200 @ 50X depth
1000 @ 10-20X depthrest @ 5-10X depth
Use in
breeding
programs
$15 M
Participating institutions: CIRAD, CAAS, BGI, IRRI, ……
1 Include publicly accessible germplasm from IRRI, CIRAD, AfricaRice , CIAT and regional collections
Tapping into the unknown
IRG Traditional Germplasm
100,000 cultivated accessions
Iterative sampling200 @ 50X800 @ 30X9K @ 6X90K @ 2X
Apply low-cost sequencingby next generation and 3rd
generation technologies
• Obtain WGS by NGS or 3rdGS for all? accessions
•Cost-benefit analysis for return on investment
• Use the association data between 2500 lines and trait phenotypes to select materials for specific evaluation
• Isolate novel genes and rare alleles contributing to these traits
103 Genomes by Illumina NGS for 1M SNP Affy chip Cornell, IRRI, USDA, DevGen, Academia Sinica, EMBRAPA, Uni Aberdeen, JBEI/JGI, NIAS, Uni Delaware, …
• 15 indica
• 6 indica/admixed (unique type in some analyses)
• 12 aus
• 17 temperate japonica
• 7 aromatic
• 16 tropical japonica
• 14 O. rufipogon and nivara (AA genome)
• 1 O. meridionalis (AA genome)
• 7 O. glaberrima (African cultivated, AA genome
• 7 O. barthii (AA genome)
• 1 O. punctata (BB genome,outgroup)
52 genomes from W Wang (Kunming Zoo Institute) & FY Hu (YAAS)• 5 indica, 4 aus, 2 deep-water, 6 aromatic, 5 trop. japonica, 4 temp. japonica,
• 25 O. rufipogon and nivara, 1 O. longistaminata
Germplasm for Genotyping/WGS
• NSF-TV (500)
• GCP genotyping set (2339)
• GCP drought (800)
• GCP Aus (300)
• Orytage/Eurigen (600) • Nominations of pure lines
• Others from IRRI Core (13,000)
• Madagascar (50)• O. rufipogon/nivara (100)• MAGIC parents (16)• ACIAR chalk (1300)• Various donors (5)• O. glaberrima (300)
• USDA core (1500)
Now have >4100 SSD genetic stocks~2000 for SNPing on 1M feature Affy arraysRest are in line for sequencing by Illumina NGS
Diversity (coverage), utility, trait donors, nominations
Phenotyping consortium for traits with impact(under GRiSP 1.2.3)
• Build consortium of partners with expertise in particular traits
• Rely on existing networks and sites as much as possible
• Identify and prioritize traits where impact is needed
• Sample from the Rice SNP set of 2500 lines for subsets targeted to specific traits and environments
• Phenotype these traits using standardized procedures
• Capture meta-data about experimental design,
o Method ontology, Experimental ontology
• Use controlled vocabulary and trait ontology for data
• Centralized database for data capture
o Pedigrees, germplasm stocks, phenotype studies, SNP data
o new schema “eRice”, Chiangzhi Liang, IRRI
• Link to external genome annotation DBs
o Full use of PO, EO, MO, TO, GO, Crop Ontology (GCP)
Phenotyping network: example traits for impactFocus on traits affected by global climate change
Root properties relevant
drought tolerance
TW16
RTV
TN1
RTV
TN1
Healthy
TW16
Healthy
Gene for virus resistanceTrait Site
Yield components Field
Disease resistance GH + disease nursery
Salinity (vegetative and reproductive
GH + Field
Drought GH + Field
Heat (humid and dry)
Growth chamber +
Field
Grain quality Laboratory
Seed physiology Laboratory
Phenotyping OryzaSNP set for traits.
Chalk, Grain quality, Disease (BLB, blast, SB, +), salinity, drought at reproductive stage, root traits, … (IRRI & GCP collaborations)
Biomass, nutritional effects on immunostimulation in mice (Jan Leach, Colorado State University)
WS2007 & WS2008 for Morpho-agron/Yield (GRC)
Aus lines 2010DS
220 lines + increasing 36
3 environments
• Early vigor
• Canopy temp
• Yield
GCP G3008.6 “Targeting Drought-Avoidance Root Traits …” A. Henry
Linking root architecture with root function
Drought treatment
0
1
2
3
4
5
6
7
0.5 1.5 2.5 3.5 4.5
week after draining cylinders
Cu
mu
lati
ve w
ate
r lo
ss
(kg
) Dular
IR64
unplanted
IRRI lysimeter facility
GCP G3008.6, A. Henry
Ontological implications of 100K+ rice genomes
• Need for a non-coding “sequence” ontology?
o Cover classes not in GO such as transposons, REs, CNVs, SNVs
• A controlled vocabulary for population structure and haplotypes?
• Extend Trait ontology?
• How does a metabolic network intrinsically differ from an ontology (apart from not necessarily being a DAG) ?
• Can they be merged using network computational methods?
Genotyping/phenotyping strategy
SNPdiscovery
OryzaSNP
resequencing
20 rice varieties
(2008-2009)
Illumina 1536-plex and
BeadXpress 384-plex
(Cornell & IRRI, ongoing)
Next-genResequencing
>100 varieties (2010)
Affymetrix 1M
2,000 varieties
(2011)
Trait association studies,
allele mining for key genes
and functional SNPs
QTL mapping, genetic
diversity analysis, MAS,
DNA fingerprinting
Sequencing the entire genebank
10,000 varieties
(2011-2012)
Rest
(2012-2015)
Phenotyping Network: Large-scale, precise
phenotyping and phenomics
(2011-2015)
Mapping populations
diverse germplasm(2 MAGIC, 20 RILs NAM)
(ongoing)
SNPgenotyping
Phenotyping
& Population
Development