The Genome Analysis Centre
Building Excellence in Genomics and Computational Bioscience
The Genome Analysis Centre
The Genome Analysis CentreThe Genome Analysis CentreThe Genome Analysis Centre
Rice Resequencing Project
Aim: To bring genomics capability to rice breeding in Vietnam in light of changing climates
Approach: Sequence varieties with interesting phenotypes and provide training in bioinformatics
The Genome Analysis CentreThe Genome Analysis Centre
Training at TGAC
Two scientists from AGI visited TGAC in September 2012 for bioinformatics training
Training topics: NGS assembly and alignment tools Phylogenetics Browser training Variant calling
The Genome Analysis Centre
The Genome Analysis CentreThe Genome Analysis Centre
Varieties
The Genome Analysis Centre
Stress Varieties: indica, japonica and javanica
Bacterial Blight Resistance
Hom rau; Khau dien lu; Nep meo nuong; Tep Thai Binh; Toc lun
Blast Resistance Ble te lo; Khau mac Buoc; Chiem nho Bac Ninh 2; Nep lun; OM 6377
Brown Planthopper Resistance
Chan thom; Coi ba dat; Khau giang; OM5629; Xuong ga
Drought tolerance Ba cho K’te; Blao sinh sai; Tan ngan; Nang quot bien; Nep bo hong Hai Duong;
Quality potential Nang thom cho Dao; Tam xoan Bac Ninh; Tam xoan Hai Hau; Te Nuong; OM 3536; Thom lai
Salt tolerance Lua Ngoi; Mot bui do; Chiem do; Nang co do 2; Nep man
Unclassified Nep ong tao; Khau Lien; Lua goc do; Chiem da; IS1.2
The Genome Analysis Centre
The Genome Analysis CentreThe Genome Analysis Centre
Sequencing
The Genome Analysis Centre
Illumina HiSeq 2000
• DNA sheared into fragments 300-500 bp in length
• 100bp sequenced from both ends of fragments
• 18 lanes of sequencing (2.25 flowcells)
• 1.3 Tb of sequence data
The Genome Analysis Centre
The Genome Analysis CentreThe Genome Analysis Centre
Ch i
em d
a
Ch i
em d
o
Ch i
em n
ho B
ac N
inh
2
IS1 .
2
Lua
goc
do
Na n
g c o
do
2
Ne p
bo
hon g
Ha i
Du o
ng
Ne p
lun
Ne p
ma n
OM
353 6
OM
637 7
Te p
Th a
i Bi n
h
Th o
m L
ai
To c
lun
Xu o
ng g
a
Ba
cho
K’te
Bla
o si
n h s
a i
Ble
te lo
Ch a
n th
om
Co i
ba
dat
Ho m
ra u
Kh a
u d i
en lu
Kh a
u g i
ang
Kh a
u L i
en
Kh a
u m
ac b
uoc
Lua
Ng o
i
Mo t
bu i
do
Na n
g th
om c
ho d
ao
Ne p
me o
nu o
ng
Ne p
on g
tao
OM
562 9
Ta m
xo a
n B
ac N
inh
Ta m
xo a
n H
ai H
au
Ta n
ng a
n
Te
Nuo
ng
Na n
g q u
ot b
ien
0
10
20
30
40
50
60
70
Varieties
Co
ver a
ge
(a
ssu
mi n
g 4
30
Mb
ge
no
me
siz
e)
Sequencing depth
The Genome Analysis Centre
Average coverage Detect homozygous base p(0.9975)
30x Detect heterozygous base p(0.9975)
The Genome Analysis Centre
The Genome Analysis CentreThe Genome Analysis Centre
Reference genomes
The Genome Analysis Centre
Indica: 93-11 Yu et al., Science 2002
Temperate japonica: Nipponbare Goff et al., Science 2002
The Genome Analysis Centre
The Genome Analysis CentreThe Genome Analysis Centre
Align reads to references (BWA)
The Genome Analysis Centre
Ch i
em d
a
Ch i
em d
o
Ch i
em n
ho B
ac N
inh
2
IS1 .
2
Lua
goc
do
Na n
g c o
do
2
Ne p
bo
hon g
Ha i
Du o
ng
Ne p
lun
Ne p
ma n
OM
353 6
OM
637 7
Te p
Th a
i Bi n
h
Th o
m L
ai
To c
lun
Xu o
ng g
a
Ba
cho
K’te
Bla
o si
n h s
a i
Ble
te lo
Ch a
n th
om
Co i
ba
dat
Ho m
ra u
Kh a
u d i
en lu
Kh a
u g i
ang
Kh a
u L i
en
Kh a
u m
ac b
uoc
Lua
Ng o
i
Mo t
bu i
do
Na n
g th
om c
ho d
ao
Ne p
me o
nu o
ng
Ne p
on g
tao
OM
562 9
Ta m
xo a
n B
ac N
inh
Ta m
xo a
n H
ai H
au
Ta n
ng a
n
Te
Nuo
ng
Na n
g q u
ot b
ien
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
unalignedJaponica-specificIndica-specificaligned to both
Varieties
Per
cen t
age
of re
ads
indicas japonicas javanica
The Genome Analysis Centre
The Genome Analysis CentreThe Genome Analysis Centre
SNP discovery
The Genome Analysis Centre
Align reads to references using BWA v0.6.1
Detect variants using GATK v1.6
Insertion/deletion realignment
SNP calling and filtering
The Genome Analysis Centre
The Genome Analysis CentreThe Genome Analysis Centre
SNPs by reference
The Genome Analysis Centre
0 200000 400000 600000 800000 1000000 1200000 1400000 16000000
200000
400000
600000
800000
1000000
1200000
1400000
1600000
Indicas
Japonicas
javanica
Number of SNPs on indica reference
Num
ber
of S
NPs
on
japo
nica
ref
eren
ce
The Genome Analysis Centre
The Genome Analysis CentreThe Genome Analysis Centre
Heterozygosity
The Genome Analysis Centre
Average % heterozygous SNPs: 0.14%
Range % heterozygous SNPs: 0.05-0.38%
Average Ts/Tv: 2.39
Range Ts/Tv: 1.95-2.55
The Genome Analysis Centre
The Genome Analysis CentreThe Genome Analysis Centre
Grouped SNPs
The Genome Analysis Centre
Trait Number of shared SNPs
Unique to set
Shared with 1
Shared with 2
Shared with 3
Shared with 4
Shared with 5
Shared with 6
Blast 43,755 - - - - 2 6 6
Blight 22,065 - - 1 20 9 15 32
Drought 27,109 - - - - 1 2 3
Planthopper 43,238 - - - 1 - 7 20
Quality 55,256 - - - 1 8 14 30
Salt 51,472 - - 3 9 18 56 149
The Genome Analysis CentreThe Genome Analysis Centre
tgac-browser.tgac.ac.uk
Username: viet_ricePassword: v1et_r1ce
The Genome Analysis Centre
The Genome Analysis CentreThe Genome Analysis CentreThe Genome Analysis Centre
Acknowledgements
Mario CaccamoSarah Ayling
Melanie FebrerAnil Thanki
Xingdong Bian
Prof HamKhuat Huu Trung
Khoa Nguyen Truong and colleagues
Giles OldroydChristian Rogers
The Genome Analysis CentreThe Genome Analysis Centre
THANKS!