Assembly: before and after

Post on 10-May-2015

2,800 views 7 download

Tags:

description

A talk I gave at the Dec 2013 Assembly Masterclass at UC Davis. Really licensed under CC0. UPDATED May 2014, for the presentation I gave at the combined SeRC Nordic Assembly Workshop in Stockholm, Sweden, May 14th 2014

transcript

Assembly – before and after

Lex Nederbragtlex.nederbragt@ibv.uio.no

@lexnederbragt

A warning

The list is by no means complete

Nor do we have experience with all the programs mentioned

Sample

DNA

Reads

Genome assembly

Sequencing AssemblyDNA isolation

QC QCQC

Reads

Genome

assembly

Assembly

QC

Fastqc

Prinseq

Many others…

www.nipgr.res.in/ngsqctoolkit.html

preqc (sga)

http://arxiv.org/abs/1307.8026

Reads

Genome

assembly

Assembly

Grooming

Format conversion

http://en.wikipedia.org/wiki/FASTQ_format

Fastq format hell

Adapter/quality trimming

http://www.biostars.org/p/53528/

Celera assemblerOverlap based trimming

Fastx ToolkitSeqtkPrinSeqNGS QC ToolkitTrimmomaticBioPiecesCutadapt……

Mate pair splitting and orientation

150 – 600 bases

Illumina paired end reads

2 – 40 kilobases

Illumina mate pair reads

2 – 40 kilobases

454 mate pair reads

linker

Mate pair splitting and orientationIllumina paired end reads

Illumina mate pair reads

454 mate pair reads

linker

junctionjunction

+ +

paired end reads ‘contamination’

Mate pair splitting and orientationIllumina paired end reads

Illumina mate pair reads

454 mate pair reads

linker

junctionjunction

+ +

paired end reads ‘contamination’

Check what orientation your assembler expects

for the reads!

Reads

Genome

assembly

AssemblyPreparing

Error-correctionStand-alone or built into assembler

Merging pairs

List from Torsten Seeman’s bloghttp://thegenomefactory.blogspot.no/2012/11/tools-to-merge-overlapping-paired-end.html

COPE http://sourceforge.net/projects/coperead/SeqPrep https://github.com/jstjohn/SeqPrepFLASH http://www.cbcb.umd.edu/software/flashfastq-join http://code.google.com/p/ea-utils/wiki/FastqJoinPANDAseq https://github.com/neufeld/pandaseqmergePairs.py http://code.google.com/p/standardized-velvet-assembly-report/source/browse/trunk/mergePairs.py

Recent addition

Extend reads

http://140.116.235.124/~tliu/arf-pe/

Digital normalisation

http://arxiv.org/abs/1203.4802

Estimate kmer to use

preqc (SGA)

http://arxiv.org/abs/1307.8026

Reads

Genome

assembly

Assembly

What can the reads tell us about the genome

kmer-based

preqc (SGA)

Kmerspectrumanalyzer

http://arxiv.org/abs/1307.8026

Khmer from Titus

Reads

Genome

assembly

Assembly

This talk

Reads

Genome

assembly

Assembly

QC

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Assemblathon stats

http://korflab.ucdavis.edu/datasets/Assemblathon/Assemblathon2/Basic_metrics/assemblathon_stats.pl

OR

https://github.com/lexnederbragt/sequencetools/

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Gap closing

IMAGE2

Correcting bases

Quiver from Pacific Biosciences

Separate scaffolding

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Assembly merging/reconciliation

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Mapped genomic reads

FRCBAM

Mapped transcriptomic reads

Gene finding

Binning

Nederbragt et al, 2010

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Genome browser(s)IGV

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Comparative measures

Log Average Probability (LAP)

Assembly Likelihood Evaluation (ALE)

See also Howison, Zapata2 and Dunn (2013) Toward a statistically explicit understanding of de novo sequence

assembly doi: 10.1093/bioinformatics/btt525

Genome assembly

Comparing to each other

Metrics

MergingImprovement

Visualization

Validation

Comparing to reference

Reference comparison

Mauve assembly metrics

Review

Too many tools…

http://seqanswers.com/wiki/Software/list

Too many tools…

http://wwwdev.ebi.ac.uk/fg/hts_mappers

88 short-read mappers

Embargo!

Benchmarking, anyone?

All-in-one assembly pipeline

doi:10.1186/1471-2105-15-126