Date post: | 17-Jan-2017 |
Category: |
Health & Medicine |
Upload: | genome-reference-consortium |
View: | 73 times |
Download: | 2 times |
FORRESEARCHUSEONLY.Notforuseindiagnosticprocedures.
Everydaydenovoassembly
GRCAssemblyWorkshopatGenomeInformatics
DeannaM.ChurchSeniorDirectorofApplicationsSep19,2016
@deannachurch
2
Acknowledgements
Theentireteamat10x
DavidJaffe
NeilWeisenfeld
VijayKumar
Preyas Shah
NCBI:
FrancoiseThibaud-Nissen
ValerieSchneider
3
Disclosures
EmployeeandShareholder
Shareholder
10xGenomics
Personalis
10xGenomicsproductsdescribedareforResearchUseOnly.Notforuseindiagnosticprocedures.
4
Questionsfromtheorganizers
Arenewassembliesusingthereference?Cantheyhelpmakethereferencebetter?Dotheymakethereferenceobsolete?
5
Agenda
Whyhaven’twealwaysdonedenovogenomeanalysis?
6
Agenda
Whyhaven’twealwaysdonedenovogenomeanalysis?WhatareLinked-Reads?
7
Agenda
Whyhaven’twealwaysdonedenovogenomeanalysis?WhatareLinked-Reads?HowdoLinked-Readsenableeverydaydenovoassembly?
8
Whyhaven’twealwaysdonedenovo genomeanalysis?
9
KellyHowe,LawrenceBerkeleyLaboratory
KellyHowe,LawrenceBerkeleyLaboratory
10
ReferencequalityisHARD
DOI:10.1038/nature03001
11
Ouractualgenome:diploid
12
Howwerepresentourgenome:haploid
13
Currentapproach:averagingoverhaplotypes
14
Currentapproach:averagingoverhaplotypes
15
Currentapproach:averagingoverhaplotypes
16
Currentapproach:averagingoverhaplotypes
17
Problem:bothallelesdifferfromeachother
18
WhatareLinked-Reads?
19
Unlinked-Reads:shortrangeinformation
20
Linked-Reads:longrangeinformation
21
StartwithlongmoleculesNA19240
22
MakingLinked-Reads
P5 16bpBCR1 Nmer gDNA Insert
23
MakingLinked-Reads
Longinputmolecule
Excessofsequenceableinsertsrandomlyprimedoffeachlongmolecule
P5 16bpBCR1 Nmer gDNA Insert
24
MakingLinked-Reads
Longinputmolecule(50Kb)
Excessofsequenceableinsertsrandomlyprimedoffeachlongmolecule
P5 16bpBCR1 Nmer gDNA Insert
Longinputmolecule(50Kb)
30xsequence~35fragments~0.2xcoverage
Standardreferencebasedanalysisrecommendations
25
MakingLinked-Reads
Longinputmolecule(50Kb)
Excessofsequenceableinsertsrandomlyprimedoffeachlongmolecule
P5 16bpBCR1 Nmer gDNA Insert
Longinputmolecule(50Kb)
56xsequence~65fragments~0.4xcoverage
Supernovaanalysisrecommendations
26
SyntheticLongReads:lessphysicalcoverage
CA B
SequencingcostPhysicalcoverage
27
Linked-Reads:greaterphysicalcoverage
CA B
SequencingcostPhysicalcoverage
28
Example– MoleculevsReadCoverage
150X avgmolecule coverage
Agivengenomiclocuswillhave
150X avg moleculedepth,and30X avg readdepth
(150Xmoleculedepth)x (0.2Xread/m)=30Xreaddepth
Chr13: BRCA2
4/4/2016 Loupe
http://loupe.fuzzplex.com/loupe/view/MTk1MzgtUEhBU0VSX1NWQ0FMTEVSX1BELTEwMTMuMC4yNi5sb3VwZQ==/reads?ranges=chr13%2B32850000-chr1… 1/1
▲
쁛 ►
>30X avgread coverage
29
GeneratingLinked-Reads
Startwith:
HMWgDNA,100Kb+molecules1.0ng inputDNA=300copiesofthegenome
0.5ngDNA=150 copiesofthegenome,partitionedinto>1MGEMs
DNA
OilBarcodedPrimerLibrary Enzyme Collect
30
HowdoLinked-Readsenableeverydaydenovoassembly?
31
Assemblymadeeasy
FASTABCL SupernovaDenovoAssembly
1200MNA19240
http://www.biorxiv.org/content/early/2016/08/19/070425
1server348Gbmemory2dayscompute
1library1.25nginput
32
Assemblymadeeasy
FASTABCL SupernovaDenovoAssembly
1200MNA192401library
1.25nginput
http://www.biorxiv.org/content/early/2016/08/19/070425
1server(28cores)348Gbmemory2dayscompute
33
Assemblymadeeasy
Measure ValueNumberof scaffolds>=10Kb 1.17 KEdgeN50 17.45KbContig N50 118.8KbPhaseblock N50 9.3MbScaffoldN50 16.4Mb
FASTABCL SupernovaDenovoAssembly
1200MNA19240
http://www.biorxiv.org/content/early/2016/08/19/070425
1server(28cores)348Gbmemory2dayscompute
1library1.25nginput
34
Performanceovermultiplesamples
http://www.biorxiv.org/content/early/2016/08/19/070425
sample ethnicity sex cov frag N50contig
N50scaffold
N50Phaseblock
gap
NA19238 YRI F 56 115 114.6 18.7 8 2.1
NA19240 YRI F 56 125 118.8 16.4 9.3 2.3
HG00733 PR F 56 106 123.6 17.8 3.4 2.0
HG00512 HAN M 56 102 113.2 15.4 2.7 2.2
NA24385 AJ M 56 120 106.4 15.1 4.2 2.6
HGP EUR M 56 139 120.2 18.6 4.5 2.5
NA12878 EUR F 56 92 118.5 16.4 2.8 2.9
35
HighqualityAssemblyatlowercoverage
102104106108110112114116118120122
500 700 900 1,100 1,300
ContigN50
(kb)
Numberofreads(millions)
0
5
10
15
20
25
500 700 900 1,100 1,300
ScaffoldN50
(Mb)
Numberofreads(millions)
0
1
2
3
4
5
500 700 900 1,100 1,300PhaseBlockN50
(Mb)
Numberofreads(millions)
36
DeNovoPerformanceDrasticallyImproveswithIncreasedDNALength
020,00040,00060,00080,000100,000120,000
0 10,000 20,000 30,000 40,000 50,000 60,000
ContigN50
0
5
10
15
20
0 10,000 20,000 30,000 40,000 50,000 60,000
ScaffoldN50
(Mb)
0100,000200,000300,000400,000500,000
0 10,000 20,000 30,000 40,000 50,000 60,000PhaseBlock
N50
DNALength
37
SupernovaAssembler
stuff
separateassembliesofhomologousloci
http://www.biorxiv.org/content/early/2016/08/19/070425
38
Assemblyarchitecture=phaseblocks
megabubble megabubble megabubble
multi-Mbphaseblocks
manyMbscaffold
microstructure• bubbles,oftenatindeterminatepoly-A• shortgaps,oftenatpoly-A
39
Assemblyassessment
Supernova10x Othermethods
0
5
10
15
20
25
NA19238 NA19240 HG00733 HG00512 NA24385 HGP NA12878 YH NA12878 NA12878 NA12878 NA24385 NA24143
PercentGRCh37100mersmissingperassembly
Missing100mershaploid Missing100mersdiploid
Diploid Haploid
40
Comparisontotruthdata
41
Improvingthereferenceassembly?
GRCh38:chr6(NC_000006.12
NA12878,hap0,scaf.21653(prev1.1)
260Kbofnewsequence
42
Bettergenotypereconstruction
chrX:6,219,000-6,220,500(GRCh38)NLGN4X(neuroligin 4,x-linked)
43chrX:6,218,359-6,221,000(GRCh38)
Bettergenotypereconstruction
44
Questionsfromtheorganizers
Arenewassembliesusingthereference?
Supernova:denovoassemblyDiploidreconstruction
NOYes
Assemblyconstruction
Assemblyanalysis
45
Questionsfromtheorganizers
Cantheyhelpmakethereferencebetter?
Yes
Supernova:individualgenomereconstructionContributingnewsequencestopopulationgraph
46
Questionsfromtheorganizers
Dotheymakethereferenceobsolete?
NO
Supernova:NotreferenceassembliesBetterindividualgenomereconstruction
47
Thanks!