Binning - KNAW · 2014-09-23 · Determining what belongs together by crosslinking total cell...

Post on 12-Feb-2020

0 views 0 download

transcript

From metagenomic contigs to draft genomes

Binning

Daan Spethd.speth@science.ru.nl

@daanspeth

The problem

Binning: clustering sequences with the same origin together

A corner piece? GREAT! But where is the rest of the puzzle?

Drew Sheneman, New Jersey -- The Newark Star Ledger

Potato processing wastewater treatment plant at Olburgen, The Netherlands

Stable system operated since 2006

Images:Left & Middle Abma et al. Water Science & Technology (2010)

Study site

nitritation/ anammox reactor (600 m3)

5.0 m

0.2 m

1.4 m

2.6 m

3.8 m

total sample

washed granules

1

2

3

4

5

6

7

8

total sample

washed granules

DNA isolation

Organic extraction

Powersoil kit

Organic extraction

Powersoil kit

Organic extraction

Powersoil kit

Organic extraction

Powersoil kit

Sampling strategy: 8 samples

Sample treatmentSample location DNA isolation

Data handles

Sequence composition

Prior knowledge (Databases)

Sequence abundance

Mate pair & Paired end

Data handles: mate pair and paired end

Data handles: mate pair and paired end

Data handles: databases

Data handles: composition

Limited chemical signature

Biological information- Codon usage (tetramer frequency)

‘Unique’ long k-mers

Contig/read length matters!

DNA isolation and

library preparation

sequencing and assembly

Data handles: abundance

Abundance in the sample correlates with abundance in reads

Many roads try to get to Rome

Reference based and reference independent binning methods

Mande, S. S., Mohammed, M. H. & Ghosh, T. S. Classification of metagenomic sequences: methods and challenges. Briefings in Bioinformatics 13, 669–681 (2012).

Many roads try to get to Rome

Composition: - GC content- Tetranucleotide frequencies

Abundance - Long k-mer copy number- Contig coverage

Content- Essential single copy genes

Mande, S. S., Mohammed, M. H. & Ghosh, T. S. Classification of metagenomic sequences: methods and challenges. Briefings in Bioinformatics 13, 669–681 (2012).

Binning approaches

Assembly independent read binning

Binning on GC content and coverage

Tetranucleotide ESOM

Differential coverage based binning- Nuceotide extraction bias- Different samples

Hi-C Metagenomics

Binning approaches

Assembly independent read binning

Binning on GC content and coverage

Tetranucleotide ESOM

Differential coverage based binning- Nuceotide extraction bias- Different samples

Hi-C Metagenomics

Assembly independent binning

Wang, Y., Leung, H. C. M., Yiu, S. M. & Chin, F. Y. L. MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample. Bioinformatics 28, i356–i362 (2012).

T = long kmer abundance

w = long kmer length

Binning approaches

Assembly independent read binning

Binning on GC content and Sequencing depth

Tetranucleotide ESOM

Differential coverage based binning- Nuceotide extraction bias- Different samples

Hi-C Metagenomics

Separating genomes: binning

Binning based on coverage and GC content

Se

quen

cin

g de

pth

GC content

Binning approaches

(This is not an exhaustive list…)

Assembly independent read binning

Binning on GC content and coverage

Tetranucleotide ESOM

Differential coverage based binning- Nuceotide extraction bias- Different samples

Hi-C Metagenomics

Binning: tetranucleotide ESOM

Dick, G. J., Andersson, A. F., Baker, B. J. & Simmons, S. L. Community-wide analysis of microbial genome sequence signatures. Genome Biology (2009).

Emergent Self Organizing Map (ESOM) based on tetranucleotide frequency

Binning: tetranucleotide ESOM

Dick, G. J., Andersson, A. F., Baker, B. J. & Simmons, S. L. Community-wide analysis of microbial genome sequence signatures. Genome Biology (2009).

Binning approaches

(This is not an exhaustive list…)

Assembly independent read binning

Binning on GC content and coverage

Tetranucleotide ESOM

Differential coverage based binning- Nuceotide extraction bias- Different samples

Hi-C Metagenomics

Using nucleotide extraction bias to separate organisms

Albertsen, M. et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol 31, 533–538 (2013).

Binning: differential coverage binning

http://madsalbertsen.github.io/multi-metagenome/

Binning approaches

(This is not an exhaustive list…)

Assembly independent read binning

Binning on GC content and coverage

Tetranucleotide ESOM

Differential coverage based binning- Nuceotide extraction bias- Different samples

Hi-C Metagenomics

differential coverage binning: crAss

differential coverage binning: groopM

http://minillinim.github.io/GroopM/

1. Imelfort, M., Parks, D., Woodcroft, B. J. & Dennis, P. GroopM: An automated tool for the recovery of population genomes from related metagenomes. (2014).

differential coverage binning: concoct

1. Alneberg, J. et al. CONCOCT: Clustering cONtigs on COverage and ComposiTion. (2013).

differential coverage binning: ESOM

1. Kantor, R. S. et al. Small Genomes and Sparse Metabolisms of Sediment-Associated Bacteria from Four Candidate Phyla. MBio 4, e00708–13–e00708–13 (2013).

differential coverage binning: ESOM

1. Nielsen, H. B. et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat Biotechnol 32, 822–828 (2014).

Binning approaches

(This is not an exhaustive list…)

Assembly independent read binning

Binning on GC content and coverage

Tetranucleotide ESOM

Differential coverage based binning- Nuceotide extraction bias- Different samples

Hi-C Metagenomics

Determining what belongs together by crosslinking total cell content

1) Crosslink2) Cut DNA3) Religate randomly4) Sequence paired end labrary of both crosslinked and native sample

Binning: Hi-C metagenomics

Beitel, C. W. et al. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. (2014). doi:10.7287/peerj.preprints.260v1

Clustering by organism (and even replicon!)

Beitel, C. W. et al. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. (2014). doi:10.7287/peerj.preprints.260v1

Binning: Hi-C metagenomics

Roads less travelled…Whichever method you choose, do a background check…

When analyzing a complex community,

experimental design largely determines how much you can get out

Binning: concluding remarks