+ All Categories
Home > Documents > Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through...

Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through...

Date post: 18-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
66
Review: Methodologies for SVs detection Fritz Sedlazeck Nov, 16, 2018
Transcript
Page 1: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Review:MethodologiesforSVsdetectionFritz Sedlazeck

Nov, 16, 2018

Page 2: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Mygroup/interestsDetectionofVariants

SnifflesSedlazeck et.al. (2018)

SURVIVORJeffareset.al.(2017)

BOD-ScoreSedlazecket.al.(2013)

Mapping/Assemblyreads

NextGenMap-LRSedlazecket.al.(2018)

FalconUnzipChin et.al.(2016)

NextGenMapSedlazecket.al.(2013)

Benchmarking

SVgenotyperChander et.al. (in prep.)

TeaserSmolka et.al.(2015)

SequencingJünemann et.al.(2013)

ApplicationsModelorganisms:-Cancer(SKBR3)(Nattestadet.al.2018)-miRNA editing(Vesely et.al.2012)

NonModelorganisms:-Cottus transposons (Dennenmoseret.al.2017)-Clunio (Kaiseret.al.2016)-Seabass (Vij et.al.2016)-Pineapple (Minget.al.2015)

Figure'1'

“moonlight”'

Page 3: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Early2000sdogma:SNPsaccountformosthumangeneticvariation

https://hapmap.ncbi.nlm.nih.gov

Page 4: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Segmentalduplications(a.k.a.Lowcopyrepeats)

Bailey et al, 2002~5% of the human genome is duplicated!

Self Dotplot: 10 megabases of Chr 15(dot = 1 kb exact match)

Page 5: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Variationingenomestructure.So-called"structuralvariation"(SV)

DB CAReference

DB CA BDuplication

CB DInversion A

DCADeletion *DB CXInsertion A

Translocation RB QA

CNV

CNV

SV

SV

SV

SV is a superset of copy number variation (CNV). Not all structural changes affect

copy number (e.g., inversions)!

Page 6: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Ourunderstandingofstructuralvariationisdrivenbytechnology

1940s - 1980sCytogenetics / Karyotyping

1990sCGH / FISH /

SKY / COBRA

2000sGenomic microarrays

BAC-aCGH / oligo-aCGH

TodayHigh throughput DNA sequencing

Page 7: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Whyare structuralvariations relevant/important?

• They are common and affect a large fraction of the genome

• They are a major driver of genome evolution

GenomicDisordersEvolution

Page 8: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Whyare structuralvariations relevant/important?

• Genetic basis of traits

Impactonregulation Impactonphenotypes

RegulatoryState

CellLine

A549Aorta

B_cells_PB

_Roadm

ap

CD14C

D16__m

onocyte

_CB

CD14C

D16__m

onocyte

_VB

CD4_ab_T_

cell_VB

CD8_ab_T_

cell_CB

CM_CD

4_ab_T

_cell_VB

DND_41

eosinop

hil_VBEPC

_VB

erythroblas

t_CB

Fetal_Ad

renal_Gland

Fetal_Intestine_

Large

Fetal_Intestine_

Small

Fetal_Muscle_L

eg

Fetal_Muscle_T

runk

Fetal_S

tomach

Fetal_Th

ymusGas

tric

GM12878

H1_mes

enchym

al

H1_neurona

l_progenitor

H1_trop

hoblastH1E

SC H9HeL

a_S3Hep

G2HMECHSM

M

HSMMtube

HUVEC_

prol_CBHUV

ECIMR

90iPS_20b

iPS_DF

_19_11

iPS_DF

_6_9K56

2

Left_Ve

ntricleLun

g

M0_mac

rophage_C

B

M0_mac

rophage_V

B

M1_mac

rophage_C

B

M1_mac

rophage_V

B

M2_mac

rophage_C

B

M2_mac

rophage_V

B

Monocytes_

CD14_PB_

Roadma

p

Monocytes_

CD14

MSC_V

B

naive_B

_cell_VB

Natural_Killer_cells_P

B

neutrop

hil_CB

neutrop

hil_mye

locyte_B

M

neutrop

hil_VBNH_

A

NHDF_A

DNHE

KNHL

FOsteoblOva

ry

Pancrea

sPlac

enta

Psoas_Mus

cle

Right_A

trium

Small_IntestineSple

en

T_cells_PB

_Roadm

apThymus

CTCF_b

inding_siteACT

IVE

CTCF_b

inding_siteINACTIVE

CTCF_b

inding_sitePOI

SED

CTCF_b

inding_siteREP

RESSED

enhancerACTIVE

enhancerIN

ACTIVE

enhancerPOIS

ED

enhancerREPR

ESSED

open_chromatin_regio

nACTIVE

open_chromatin_regio

nINACT

IVE

open_chromatin_regio

nNA

open_chrom

atin_reg

ionPOIS

ED

open_chromatin_regio

nREPRE

SSED

promoterACTIVE

promoter_flanking

_region

ACTIVE

promoter_flanking

_region

INACTIVE

promoter

_flankin

g_region

POISED

promoter_flanking

_region

REPRES

SED

promoterIN

ACTIVE

promoter

POISED

promoterREPR

ESSED

TF_bind

ing_siteACT

IVE

TF_bind

ing_siteINACTIVE

TF_bind

ing_siteNA

TF_bind

ing_sitePOI

SED

TF_bind

ing_siteREP

RESSED

A549Aorta

B_cells_PB

_Roadm

ap

CD14C

D16__m

onocyte

_CB

CD14C

D16__m

onocyte

_VB

CD4_ab_T_

cell_VB

CD8_ab_T_

cell_CB

CM_CD

4_ab_T

_cell_VB

DND_41

eosinop

hil_VBEPC

_VB

erythroblas

t_CB

Fetal_Ad

renal_Gland

Fetal_Intestine_

Large

Fetal_Intestine_

Small

Fetal_Muscle_L

eg

Fetal_Muscle_T

runk

Fetal_S

tomach

Fetal_Th

ymusGas

tric

GM12878

H1_mes

enchym

al

H1_neurona

l_progenitor

H1_trop

hoblastH1E

SC H9HeL

a_S3Hep

G2HMECHSM

M

HSMMtube

HUVEC_

prol_CBHUV

ECIMR

90iPS_20b

iPS_DF

_19_11

iPS_DF

_6_9K56

2

Left_Ve

ntricleLun

g

M0_mac

rophage_C

B

M0_mac

rophage_V

B

M1_mac

rophage_C

B

M1_mac

rophage_V

B

M2_mac

rophage_C

B

M2_mac

rophage_V

B

Monocytes_

CD14_PB_

Roadma

p

Monocytes_

CD14

MSC_V

B

naive_B

_cell_VB

Natural_Killer_cells_P

B

neutrop

hil_CB

neutrop

hil_mye

locyte_B

M

neutrop

hil_VBNH_

A

NHDF_A

DNHE

KNHL

FOsteoblOva

ry

Pancrea

sPlac

enta

Psoas_Mus

cle

Right_A

trium

Small_IntestineSple

en

T_cells_PB

_Roadm

apThymus

CTCF_b

inding_siteACT

IVE

CTCF_b

inding_siteINACTIVE

CTCF_b

inding_sitePOI

SED

CTCF_b

inding_siteREP

RESSED

enhancerACTIVE

enhancerIN

ACTIVE

enhancerPOIS

ED

enhancerREPR

ESSED

open_chromatin_regio

nACTIVE

open_chromatin_regio

nINACT

IVE

open_chromatin_regio

nNA

open_chrom

atin_reg

ionPOIS

ED

open_chromatin_regio

nREPRE

SSED

promoterACTIVE

promoter_flanking

_region

ACTIVE

promoter_flanking

_region

INACTIVE

promoter

_flankin

g_region

POISED

promoter_flanking

_region

REPRES

SED

promoterIN

ACTIVE

promoter

POISED

promoterREPR

ESSED

TF_bind

ing_siteACT

IVE

TF_bind

ing_siteINACTIVE

TF_bind

ing_siteNA

TF_bind

ing_sitePOI

SED

TF_bind

ing_siteREP

RESSED

0500

1000

1500

2000

scale

affecte

d #

Page 9: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Outline

1. CNVanalysis

2. SVsanalysis1. Assemblybased2. Shortreads3. Longreads

3. Reviewplan

Page 10: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Humansdifferbyroughly3,000deletions(>=500bp)

Page 11: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Humansdifferbyafewhundredduplications

Page 12: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Copy-number Profiles

Page 13: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Gingko http://qb.cshl.edu/ginkgo

Interactive Single Cell CNV analysis & clustering• Easy-to-use, web interface, parameterized for binning,

segmentation, clustering, etc• Per cell through project-wide analysis in any species

Compare MDA, DOP-PCR, and MALBAC• DOP-PCR shows superior resolution and consistency

Available for collaboration• Analyzing CNVs with respect to different clinical outcomes• Extending clustering methods, prototyping scRNA

Interactive analysis and assessment of single-cell copy-number variations.Garvin T, Aboukhalil R, Kendall J, Baslan T, Atwal GS, Hicks J, Wigler M, Schatz MC (2015) Nature Methods doi:10.1038/nmeth.3578

Page 14: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Data are noisy

Potentialforbiasesateverystep• WGA:Non-uniformamplification• LibraryPreparation:Lowcomplexity,readduplications,barcoding• Sequencing:GCartifacts,shortreads• Computation:mappability,GCcorrection,segmentation,treebuilding

CoverageistoosparseandnoisyforSNPanalysis->Requiresspecialprocessing

Page 15: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

CNVanalysis§ Dividethegenomeinto“bins”with~50– 100reads/bin§ Mapthereadsandcountreadsperbin

Useuniquelymappablebasestoestablishbins

1.Binning

Page 16: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

1.Binning

CNVanalysis§ Dividethegenomeinto“bins”with~50– 100reads/bin§ Mapthereadsandcountreadsperbin

Useuniquelymappablebasestoestablishbins

Page 17: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

1.Binning

5 4 5 10 11 5 2 5

CNVanalysis§ Dividethegenomeinto“bins”with~50– 100reads/bin§ Mapthereadsandcountreadsperbin

Useuniquelymappablebasestoestablishbins

Page 18: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

2. Normalization

Also correct for mappability, GC content, amplification biases

Page 19: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

3. Segmentation

CircularBinarySegmentation(CBS)

i j j j ji ji

Page 20: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

4.EstimatingCopyNumber

CN = argminnX

i,j

(Yi,j � Yi,j)2o

Page 21: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

UsingNanopore MinION:CNVkaryotyping.

Page 22: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Nanopore sequencingforCNVdetection

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1819 20212223XY

Page 23: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

SKBR3 cell line CNV Analysis

Page 24: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

SID97277- partialchromosomaldeletions

MinIONdata

~60kreads

MiSeq Data

5qdeletion indicatespoorprognosis Chr11abnormalities

indicatepoor prognosis

Page 25: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

SID97277karyotype

Page 26: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

SID97279– trisomy6,15,22anddeletionsinchr11

MinIONData

~73kreads

MiSeq Data

Trisomy6correlatedwithintermediateprognosis

Abnormalitieson11indicatepoorprognosis

Page 27: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

CNVdetectionsummary

• Advantages• Lesscoverageisrequired

• ->Applicationssuchassinglecellsequencing

• Disadvantages• Resolutionofevents

• usuallyinthemultikbp• Onlydeletionsandduplications• Coveragebiasesinshortreads

Page 28: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Assemblybased

1. Denovoassembly2. Genomicalignment(WGA)3. Detanglethegenomicalignmenttoidentifyvariants.

Page 29: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Ingredients for a good assembly

Current challenges in de novo plant genome sequencing and assemblySchatz MC, Witkowski, McCombie, WR (2012) Genome Biology. 12:243

Coverage

High coverage is required– Oversample the genome to ensure

every base is sequenced with long overlaps between reads

– Biased coverage will also fragment assembly

Lander Waterman Expected Contig Length vs Coverage

Read Coverage

Exp

ect

ed

Co

ntig

Le

ng

th (

bp

)

0 5 10 15 20 25 30 35 40

10

01

k1

0k

10

0k

1M

+dog mean

+dog N50

+panda mean

+panda N50

1000 bp

710 bp

250 bp

100 bp

52 bp

30 bp

Read Coverage

Expe

cted

Con

tig

Leng

th

Read Length

Reads & mates must be longer than the repeats– Short reads will have false overlaps

forming hairball assembly graphs– With long enough reads, assemble

entire chromosomes into contigs

Quality

Errors obscure overlaps– Reads are assembled by finding

kmers shared in pair of reads– High error rate requires very short

seeds, increasing complexity and forming assembly hairballs

Page 30: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Goal of WGA

• For two genomes, A and B, find a mapping from each position in A to its corresponding position in B

CCGGTAGGCTATTAAACGGGGTGAGGAGCGTTGGCATAGCA

CCGGTAGGCTATTAAACGGGGTGAGGAGCGTTGGCATAGCA

Page 31: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Not so fast...

• Genome A may have insertions, deletions, translocations, inversions, duplications or SNPs with respect to B (sometimes all of the above)

CCGGTAGGATATTAAACGGGGTGAGGAGCGTTGGCATAGCA

CCGCTAGGCTATTAAAACCCCGGAGGAG....GGCTGAGCA

Page 32: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

WGA visualization

• How can we visualize whole genome alignments?

• With an alignment dot plot• N x M matrix

• Let i = position in genome A• Let j = position in genome B• Fill cell (i,j) if Ai shows similarity to Bj

• A perfect alignment between A and B would completely fill the positive diagonal

T

G

C

A

A C C T

Page 33: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

B

A

B

A

Translocation Inversion Insertion

Page 34: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

• Different structural variation types / misassemblies will be apparent by their pattern of breakpoints

• Most breakpoints will be at or near repeats

• Things quickly get complicated in real genomes

http://mummer.sf.net/manual/AlignmentTypes.pdf

Page 35: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Assemblybaseddetectionsummary

• Advantages• Enablesthedetectionofeveryevent• Goodqualityforinsertions

• Disadvantages• Genomicalignmentischallenging.• Heterozygouseventsarelikelymissed.

Page 36: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

HowtodetectStructuralVariations

Page 37: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Sequencealignment“signals”forstructuralvariation

1. Align DNA sequences from sample to human reference genome

2. Look for evidence of structural differences

Ref.

Exp.

(a) Depth ofcoverage

(b) Paired-endmapping

(c) Split-readmapping

(d) de novoassembly

Low HighResolution

Page 38: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Lookingfor"discordant"paired-endfragments

Paired-end sequencing

Ref

Sample

paired-ends map farther away than expected

2000 bp

Slide in collaboration with Ira Hall

Page 39: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

AprobabilisticframeworkforSVdiscovery

Layer et al, 2014

Ryan Layer

Lumpy integrates paired-end mapping, split-read mapping, and depth of coverage for better SV discovery accuracy

Page 40: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Problem#1:Oftenmanyfalsepositives

- Short reads + heuristic alignment + rep. genome = systematic alignment artifacts (false calls)

- Chimeras and duplicate molecules

- Ref. genome errors (e.g., gaps, mis-assemblies)

- ALL SV mapping studies use strict filters for above

Page 41: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Problem#2:Thefalsenegativerateisalsotypicallyhigh

- Most current datasets have low to moderate physical coverage due to small insert size (~10-20X)

- Breakpoints are enriched in repetitive genomic regions that pose problems for sensitive read alignment

- FILTERING!

- The false negative rate is usually hard to measure, but is thought to be extremely high for most paired-end mapping studies (>30%)

- When searching for spontaneous mutations in a family or a tumor/normal comparison, a false negative call in one sample can be a false positive somatic or de novo call in another.

Page 42: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Howtofilter/choosetheSVcaller?• Eachmethodappliesitsownheuristics.

Method # Sim. SV avg FDR avg SensitivityDELLY 33-198 0.13 0.75LUMPY 33-198 0.06 0.62Pindel 33-198 0.04 0.55SURVIVOR 33-198 0.01 0.70

Page 43: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

PacBio /ONTsequencer

Advantage:• Longreads,Disadvantage:• Throughput/yield• Costs• Higherrorrates

Page 44: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

LongReadTechnologies

• (+)SVsinrepetitiveregions• (+)SpanSVs• (+)Uniformcoverage• (+)CanidentifymorecomplexSVs

• (-)Higherseq.errorrate• (-)Hardtoalign

Page 45: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Mappingchallenges

BWA-MEM: NGMLR:

Page 46: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Mappingchallenges

BWA-MEM: NGMLR:

Page 47: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

NGMLR+Sniffles

• NGMLR• Convexgapcostmodeltobetterdistinguishseq.errorvs.signal

• Novelmethodforsplitreadalignment.

• Sniffles• Includesmultiplestatisticalmodelstodistinguishnoisevs.signal

Page 48: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

100

250

500 1k 5k 10k

50k

Indels

0

20

40

60

80

100

BLAS

R

100

250

500 1k 5k 10k

50k

Duplication

100

250

500 1k 5k 10k

50k

Translocation

100

250

500 1k 5k 10k

50k

Inversion

100

250

500 1k 5k 10k

50k

0

20

40

60

80

100

BWA

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

0

20

40

60

80

100

GraphMap

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

0

20

40

60

80

100

NGMLR

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

1.3Longreadmapping

Precise

Indicated

Wrong

Alignmentstoppedprior

Notaligned

Page 49: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Morecomplextypes

Page 50: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

2.4LongreadSVcalling

100

250

500 1k 5k 10k

50k

Indels

0

20

40

60

80

100

SURV

IVOR

100

250

500 1k 5k 10k

50k

Duplication

100

250

500 1k 5k 10k

50k

Translocation

100

250

500 1k 5k 10k

50k

Inversion

100

250

500 1k 5k 10k

50k

0

20

40

60

80

100

PBHo

ney

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

0

20

40

60

80

100

Sniffles

+BWA

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

0

20

40

60

80

100

Sniffles

+NGM−LR

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

Precise

Indicated

Notfound

Additionalevents

Page 51: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

2.4LongreadSVcalling

Precise

Indicated

Notfound

Additionalevents

100

250

500 1k 5k 10k

50k

Dup

020406080100

SURV

IVOR

100

250

500 1k 5k 10k

50k

Indel

100

250

500 1k 5k 10k

50k

Inv

100

250

500 1k 5k 10k

50k

Tra

100

250

500 1k 5k 10k

50k

InvDel

100

250

500 1k 5k 10k

50k

InvDup

100

250

500 1k 5k 10k

50k

020406080100

PBHoney

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

020406080100

Sniffles

+BWA

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

020406080100

Sniffles

+NGM−LR

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

Dup

020406080100

SURV

IVOR

100

250

500 1k 5k 10k

50k

Indel

100

250

500 1k 5k 10k

50k

Inv

100

250

500 1k 5k 10k

50k

Tra

100

250

500 1k 5k 10k

50k

InvDel

100

250

500 1k 5k 10k

50k

InvDup

100

250

500 1k 5k 10k

50k

020406080100

PBHo

ney

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

020406080100

Sniffles

+BWA

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

020406080100

Sniffles

+NGM−LR

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

100

250

500 1k 5k 10k

50k

INVDEL

INVDUPInversionflankedbydeletions:

• Haemophilia A

Invertedtandemduplication:• Pelizaeus-Merzbacher disease• MECP2• VIPR2

Page 52: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

3.2NA12878

• Healthyfemale

• Goldstandardingenomics

• Sequencedwithmanytechnologiesindependently:• Illumina,PacBio,OxfordNanopore

Page 53: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

3.2NA12878:Deletioncalling

Tech. Cov. Avg len SVs DEL DUP INV INS TRA

PacBio 55x 4,334 22,877 9,933 162 611 12,052 119

OxfordNanopore

28x 6,432 32,409 27,147 87 323 4,809 43

Illumina 50x 2x101 7,275 3,744 731 553 0 2,247

Page 54: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

3.2NA12878:Deletioncalling

Tech. Cov. Avg len SVs DEL DUP INV INS TRA

PacBio 55x 4,334 22,877 9,933 162 611 12,052 119

OxfordNanopore

28x 6,432 32,409 27,147 87 323 4,809 43

Illumina 50x 2x101 7,275 3,744 731 553 0 2,247

Page 55: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

3.2OxfordNanoporedeletions

illumina

PacBio

OxfordNanopore

Page 56: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

3.2NA12878:Deletioncalling

Tech. Cov. Avg len SVs DEL DUP INV INS TRA

PacBio 55x 4,334 22,877 9,933 162 611 12,052 119

OxfordNanopore

28x 6,432 32,409 27,147 87 323 4,809 43

OxfordNanopore@Baylor

34x 4,982 12,596 7,102 169 113 5,166 46

Illumina 50x 2x101 7,275 3,744 731 553 0 2,247

Page 57: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

3.2NA12878:Deletioncalling

Tech. Cov. Avg len SVs DEL DUP INV INS TRA

PacBio 55x 4,334 22,877 9,933 162 611 12,052 119

OxfordNanopore

28x 6,432 32,409 27,147 87 323 4,809 43

OxfordNanopore@Baylor

34x 4,982 12,596 7,102 169 113 5,166 46

Illumina 50x 2x101 7,275 3,744 731 553 0 2,247

Page 58: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

3.2NA12878:check2,247 vs 119TRA

Illuminadata

Translocation:

PacBiodata

ONTdata

Truncatedreads:

InsertionInrep.region

Overlap Illumina TRA(%)Translocations 7.74Insertions 53.05Deletions 12.06Duplications 0.57Nested 0.31Highcoverage 1.87Lowcomplexity 9.79Explained 85.40

Page 59: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

NA12878:check2,247 TRA

ONTdata

PacBiodata

Illuminadata

InsertionInrep.region

Inversion:

Translocation:

Truncatedreads:

InsertionInrep.region

Page 60: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

SKBR-3usingPacbio

(Davidsonetal,2000)

Oftenusedforpre-clinicalresearchonHer2-targetingtherapeuticssuchasHerceptin(Trastuzumab)andresistancetothesetherapies.

MostcommonlyusedHer2-amplifiedbreastcancercellline

80chromosomes insteadof46

Page 61: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Her2GSDMB

TATDN1

8Mb

RARA

PKIA

InversionwasonlyfoundbySniffles

Page 62: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Her2

Chr 17Chr 8

1. Healthychromosome17&82. Translocationinto

chromosome83. Translocationwithin

chromosome84. Complex variantand

invertedduplicationwithinchromosome8

5. Translocationwithinchromosome8

Page 63: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Medicalapproach:UsingNanopore MinION

Page 64: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

GBAMutationsinParkinsonandGaucher

Page 65: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

ReviewonSVmethodologies

• Whichmethodsdoexistpermethodology?• Assemblyvs.shortreadmappingvs.longreadmapping

• Whataretheadvantages/disadvantagespermethodology• Accuracy• Costs• Limitations,remainingchallenges,complexalleles,polyploidy,etc.

• Whereisthefieldat?• Diploidassemblies• PhasingofSVs+SNPs

• Wehaveanoutlineandajournalthatisinterestedtoworkwithus.

Page 66: Review: Methodologies for SVs detection · segmentation, clustering, etc • Per cell through project-wide analysis in any species Compare MDA, DOP-PCR, and MALBAC • DOP-PCR shows

Thankyou

• SVcallingisSNPcallingof2008• Readsaretypicallyshorterthantheallele.• Lotofnoiseinthedata


Recommended