Sequencing the large and complex genome of
Aegilops tauschii, one of the progenitors of
common wheat
Ae. tauschii ssp. strangulata accession AL8/78
0.1
A310
R392
A374
C356
A260
Af369
Tm415Af255
Af335
Af256
A295Af371Af367
Af258Af370C347Af368Af261
Af372
A263
Af259Af373
A304A264
A281
A282
A265
A302A278A280
A283A279 I298
A303I364
I365
A300
A358
A305A266
I297V424V425
Tm410
C348
Tm418B327
B333Af262
B318
I399I400
C349C350
C343
C346
C344
C345
Tm409Tm411
Tm412
Tm414Tm413
Tm416
V423
V420419422I366
V426V421
A357
A271A301
C341C340
A267A268
A270
I403
Tm408
A363A273
A359
A360
A375A376
A285
A308
A362
A377
A269
A299I381
I404
I398B402I407
B315316
B331B326R379
B320B321B317B322B323B324
Af396328
B330A277
A274
A275311
I401A294R296A306
A307A361I397
A288
A290A276
I391
R389
B390
A293A309B334
T286
R388
A312A313
R394
B319
R395
B314
R382
R384
B325R387
R393
R385
R383R386
197189
124
114125
44
699100
35
2310974
51721256
26
4113725311925477
65195
15
3185180196
5379
248
311420025047
211
1445224645
1291819
16622108
170208153
2152209227
1143156207
229
202198
428223
225342266373
427186120194 181
187
182183184
4
136252
154193
Strangulata gene pool Transcaucasia
Strangulata gene pool southwestern Caspian Iran
Strangulata gene pool southeastern Caspian Iran
Tauschii gene pool
T. aestivum
Strangulata gene pool T. aestivum
Tauschii gene pool
Aegilops tauschii
Ae. tauschii (genomes DD)
ssp.strangulata
AL8/78
Ae. tauschii
ssp. strangulata
Ae. tauschii
ssp. tauschii
Hexaploid wheat (genomes AABBDD)
172 Ae. tauschii accessions, 178 wheat accessions, 55 RFLP loci
Choice of Ae. tauschii accession
Computed (red and blue curves) and observed (circles)
declines of synteny in intergenic regions in Triticeae
genomes
Science 316, 1862 (2007)
Syn
teny
in inte
rgenic
spaces
• Developed SNaPshot HICF BAC fingerprinting
method
• Fingerprinted 461,706 BAC clones
• Assembled BACs into 3,578 contigs
• MTP across the contigs: 4,792 Mbp
Physical map development
• AL8/78 (ssp. strangulata) x AS75 (ssp. tauschii)
• Developed Ae. tauschii 10K Infinium
SNP assay
• Genotyped 1,102 F2 plants
• Mapped 7,185 SNP markers
Genetic map
Anchoring of BAC contigs on the genetic map with
the 10K Ae. tauschii SNP Infinium assay
F2 population
2-D BAC pools
Luo et al., PNAS
110, 2013
Anchoring algorithm
• BAC is positive for an SNP marker
• At least one neighboring BAC in the
contig is positive for the same SNP
marker
• Accept as true
Aegilops tauschii physical map
•
Luo et al., PNAS 110
2013
recombination rate
physical map
gene density
density of collinear
genes
• Validate, pool eight overlapping BAC clones, isolate
DNA, index each pool, and sequence with MiSeq
MTP
Chromosomegroups
poolDNAisolation
BAC-end sequence
Re-fingerprinting
NGS
• Assemble pair-end reads together with the BGI long pair-
end reads into scaffolds
• Merge scaffolds within a pool
• Merge scaffolds among pools
• Validate assembly and scaffold merging
• Sequence assembly
Advanced optical mapping: BioNano technology
o High throughput
o Uniform DNA stretching
facilitating precise DNA length
measurements
o Low error rates in assembly
Restriction nicked (Nt.BspQ1 enzyme) and labeled HMW DNA
0
200
400
600
800
1000
1200
1400
1600
1800
20X 30X 40X 50X 60X 70X 80X 90X 100X
N50
Average
Whole-genome nanomap scaffold length C
on
tig
len
gth
in
Kb
Genome equivalents assembled
Distribution of restriction sites in a Nanomap contig
Distribution of restriction sites in DNA scaffold
Whole-genome nanomap of Ae. tauschii
Error rate 11/3000 contigs examined (0.4%)
Ordering and orienting sequence scaffolds on
the WG nanomap
ctg1715
6D (76.097 cM)
ctg12344
6D (76.097 cM)
ctg195
6D (75.686 cM)ctg6115
6D (76.097 cM)
Nanomap contig Sequence scaffolds
Sequence scaffold assembly v.1.0
Total scaffold length: 5.7 Gb
Average scaffold N50 length: 203 Kb
Sequence scaffold assembly v. 1.1
Total scaffold length: 4.4 Gb
Average scaffold N50 length: 405 Kb
S. bicolor O. sativa B. distachyon Ae. tauschii
27,640 28,236 25,532 36,371 genes
(-3,026) (+7,813)
Dynamics of gene content in grass lineages
28,289
28,200
28,965
Massa et al. Mol Biol Evol 28:2537, 2011
Aegilops tauschii physical map
•
Luo et al., PNAS 110
2013
recombination rate
physical map
gene density
only collinear genes
Prolamin gene region
Ancestral genome
Sorghum
Rice
B.distachyon
1 2 3 4 5 6 7 8 9 10 11 12 13
14 15
16 17 18
19 20 21 22
Ancestral genesProlamin genesInserted genesDeleted genes
Ae.tauschii
1 2 3 4 5 6 7 8 9 10 11 12 13
14 15
16 17 18
19 20 21 22
Ancestral genesProlamin genesInserted genesDeleted genes
Sorghum
Rice
B.distachyon
Ancestral genome
Prolamin gene region in distal region of 1D
3.1 Mb in Ae. tauschii
PacBio P6-C4 chemistry
56 SMRT cells
Total 40 Gbp
Mean read length 9.1 kb
N50 read length 13.0 kb
10 kb20 kb
Num
ber
of re
ads
Read length
Whole-genome shotgun sequence to
close gaps
Where can I access data?
http://aegilops.wheat.ucdavis.edu/ATGSP/data.php
Batch download of scaffolds:
BLAST:
ftp://ftp.ccb.jhu.edu/pub/data/Aegilops_tauschii/
Karin Deal
Pat McGuire
Ming-Cheng Luo
Naxin Huo, Yi Wang
Chad Jorgensen
Tingting Zhu, Sonny Van
Lichan Xiao, Luxia Yuan
Luis Curiel, Scott Liu
JC Rodriguez, Thanh Ngo
Armond Murray
Olin Anderson
Yong Gu
Katrien Devos
Hao Wang
Jeffrey Bennetzen
Acknowledgements
Richard McCombie
National Science Foundation Plant Genome
Klaus Mayer
Matthias Pfeifer, Karl Kugler
Steven Salzberg
Aleksey Zimin
Daniela Puiu
Geo Pertea
Thomas Wicker
Jaroslav
Doležel
Shuhong Ouyang
Yong Liang
Zhenzhong Wang
Zhiyong Liu
Qixin Sun
Zhengqiang Ma
Alex Hastie
Andrew Anfora
Palak Sheth
Long Mao
Eric Lyons
Frank You Philippe Leroy
Cari Soderlund