Applied Bioinformatics for Plant Genome Characterisation using Next-Generation Sequence Data
1
David Edwards
University of Queensland, Australia
Outline
• Sequencing wheat chromosome arms
• Wheat evolution
• Chickpea chromosomal genomics
• Skim GBS based genome assembly
• Skim GBS based trait association
• Assessing gene presence/absence variation
• Extreme non-model species
Chromosome sequencing
• Isolate chromosome arms using flow cytometry
• Generate NGS libraries and PE Illumina data
• De novo assemble
Wheat genome
4
http://www.jic.ac.uk/staff/graham-moore/wheat_meiosis.htm 17 billion bases
Mapping reads to reference genomes
5
1
2
3
4
5
6
11
10
9
8
7
12
Sequencing wheat chromosome arms
6
Ta 7DS Bd 1
Bd 3
www.wheatgenome.info
Berkman, et al., Plant Biotechnology Journal (2011)
Wheat genome evolution
AA
BB AW
AABB
50,000 years ago
DD
AABBDD
10,000 years ago
AABB
DD
A little history
8 http://www.nap.edu/openbook.php?record_id=12692&page=94
Wheat genome evolution
9
• When 2 genomes come together, they lose genes as two copies may not be required or may even be harmful
• Can we see differential gene loss between the three wheat genomes?
Figure 1 Wheat genome evolution
The number of conserved genes within the syntenic builds for chromosome 7A, B and D genomes
10
Wheat genome evolution
11
• Are there differences in the types of genes lost?
• Conservation of highly networked genes under neutral selection
• Strong selection pressure breaks networks and leads to loss of networked genes
7A gene network
12
7B gene network
13
7D gene network
14
Wheat genome evolution
15
AA
BB AW
AABB
50,000 years ago
DD
AABBDD
10,000 years ago
Neutral selection
Strong selection
16
SGSautoSNP
17
Australian resequencing
4 million SNPs
18
# SNPs SNPs/Mb
7A 1,486,040 4077
7B 1,860,295 4737
7D 671,976 1939
Wheat genome evolution
19
AA
BB AW
AABB
50,000 years ago
DD
AABBDD
10,000 years ago
AABB
DD Genetic exchange
No genetic exchange
20
SNP matrix
21
AC
Barrie 0
Alsen 194,725 0
Baxter 328,294 246,218 0
Chara 592,193 438,075 146,171 0
Drysdale 429,530 319,401 392,632 730,606 0
Excalibur 346,557 273,217 324,087 567,179 367,279 0
Gladius 529,898 327,659 472,457 906,611 616,253 491,885 0
H45 385,753 265,113 339,227 627,589 298,414 280,576 519,690 0
Kukri 245,356 208,666 290,506 541,524 428,134 318,029 480,575 345,358 0
Pastor 302,731 289,053 340,269 603,323 336,029 284,559 552,119 309,025 302,231 0
RAC875 412,818 257,630 390,967 722,089 429,038 368,152 158,973 386,145 418,037 375,137 0
VolcaniD
DI 508,175 413,676 412,553 808,658 696,467 600,478 813,067 633,916 498,017 586,694 643,205 0
Westoni
a 354,599 276,490 310,192 623,591 500,461 362,800 557,464 405,842 346,683 349,542 403,411 678,631 0
Wyalkatc
hem 525,289 341,043 433,228 800,300 560,759 327,888 386,213 449,614 436,777 442,941 235,924 800,137 505,345 0
Xiaoyan
54 458,214 332,986 368,604 761,864 540,264 324,881 696,677 377,053 401,191 413,462 522,021 897,807 622,449 569,223 0
Yitpi 544,440 328,216 468,743 968,088 690,017 548,694 233,539 587,310 530,687 580,060 287,648 951,537 654,967 444,084 844,785 0
AC
Barrie Alsen Baxter Chara Drysdale Excalibur Gladius H45 Kukri Pastor RAC875
VolcaniD
DI
Westoni
a
Wyalkatc
hem
Xiaoyan
54 Yitpi
Phylogenetic tree
22
GBrowse http://wheatgenome.info/
Chickpea kabuli reference
Kabuli reference
Kabuli reference
Desi Kabuli
Chickpea desi vs kabuli
Desi reference
Ruperao et al. Plant Biotechnology Journal (in press)
Desi Kabuli Desi WGS
Skim GBS based genome validation
• Skim GBS SNP calling
• Make metaSNPs
• Merge contigs
• Genetic map
• Compare all blocks against all
• Apply clustering
Skim GBS
30
• Determine SNPs by sequencing parents and running SGSautoSNP
• Low coverage skim sequence segregating population
• Map reads to the reference genome
• Call genotype where reads cover previously defined SNP
• Impute and clean to define haplotype blocks
Genotype calling
31
Call genotype of previously predicted SNPs
A
C/A T/C
A
Haplotype blocks
TN1 A G G T C C A G G A T A A T
TN2 A G G T C C A G G A T A A T
TN3 T C C A G G C G G A T A A T
TN4 A G G T C C A G G A T A A T
TN5 T C C A G G C T C G C G G C
TN6 A G G T C C A G G A T A A T
TN7 T C C A G G C T C G C G G C
T A G G T C C A G G A T A A T
N T C C A G G C T C G C G G C
Pre-imputation
After imputation and cleaning
Clustering
Clustering
LG 1 after ordering
Trait association
38
Disease resistance in canola
Drought tolerance in chickpea
Gene loss
Cabbage
40
Brussel sprout
41
Cauliflower
42
Kale
43
Kohlrabi
44
Wild B. oleracea
45
Brassica pan-genome
46
List all Brassica genes Essential (conserved) Optional (presence/absence variation) Associate PAVs with traits Abundance of optional genes with fitness
Seagrass
47 Manatee grazing on seagrass (picture by David Peart).
Manacheese?
48
Seagrass
GO.ID Term
GO:0018871 1-aminocyclopropane-1-carboxylate metabolic prcesses
GO:0042218 1-aminocyclopropane-1-carboxylate biosynthetic processes
GO:0009692 ethylene metabolic process
GO:0009693 ethylene biosynthetic process
GO:0043449 cellular alkene metabolic process
GO:0043450 alkene biosynthetic process
GO:1900673 olefin metabolic process
GO:1900674 olefin biosynthetic process
GO:0048447 sepal morphogenesis
GO:0048451 petal formation
GO:0048453 sepal formation
GO:0048442 sepal development
GO:0048464 flower calyx development
GO:0048446 petal morphogenesis
GO:0010044 response to aluminum ion
GO:0071281 cellular response to iron ion
GO:0010039 response to iron ion
GO:0010105 negative regulation of ethylene mediated signalling pathway
GO:0070298 negative regulation of phosphorelay signal transduction system
GO:0048441 petal development
GO:0048465 corolla development
GO:0071248 cellular response to metal ion
GO:0009963 positive regulation of flavonoid biosynthetic process
GO:0010104 regulation of ethylene mediated signalining pathway
GO:0070297 regulation of phosphorelay signal transduction system
GO:1900378 positive regulation of secondary metabolite biosynthetic process
GO:0071241 cellular response to inorganic substance
GO:0009956 radial pattern formation
GO:0010375 stomatal complex patterning
GO:0048729 tissue morphogenesis
GO:2000038 regulation of stomatal complex development
GO terms for genes lost in seagrass
Conclusions
• Build high quality genome assemblies
• Identify variation between genomes
• Associate genome variation with agronomic traits
• Apply diverse genomic knowledge to improve crops
Acknowledgements
52
Philipp Bayer
Kenneth Chan
Pradeep Ruperao
Michal Lorenc
Agnieszka Golicz
Kaitao Lai
Paul Visendi
Paula Martinez
Jenny Lee
Juan Montenegro
Paul Berkman
Jiri Stiller
Sahana Manoli
Jacqueline Batley
Alice Hayward
Emma Campbell
Jessica Dalton-Morgan
Satomi Hayashi
Reece Tollenaere
Hana Šimková
Marie Kubaláková
Jaroslav Doležel
Tim Sutton
Deepa Jaganathan
Rajeev Varshney
(and colleagues)
Martin Schliep
Rudy Dolferus
Peter Ralph
Contact:
Acknowledgements
53
Kaitao Lai
Philipp Bayer
Kenneth Chan
Michal Lorenc
Agnieszka Golic
Paul Visendi
Pradeep Ruperao
Paul Berkman
Jiri Stiller
Sahana Manoli
Jacqueline Batley
Alice Hayward
Emma Campbell
Jessica Dalton-Morgan
Satomi Hayashi
Hana Šimková
Marie Kubaláková
Jaroslav Doležel
Contact:
Advisory Board Jeff Bennetzen Jose Crossa Robert Henry Rodomiro Ortiz Andrew Paterson Kadambot Siddique Mark Sorrells Mark Tester Michael Udvardi