Eccmid meet the-expert

Post on 16-Jul-2015

437 views 1 download

Tags:

transcript

What bioinformatic tools should I use for analysis of high-throughput sequencing data

for molecular diagnostics?

Nick Loman

Read QC

Assembly

Whole-genome alignment

Reference-based approach

De novo approach

MauveParsnpAlignment BWA

Variant calling Samtools/VarScanGATK

SPADES

FastQCQualimapKrakenBLAST!Adaptor/quality

trimming Trimmomatic

SNP extractionPython script!Snippy

Recombination filtering Gubbins

MLST/Antibiogram

Annotation

Mlstabricate

Prokka

Tree building FastTreeRAXML

Tree building Harvest

Population genomicsBIGSDBPhyloviz

MLST/Antibiogram SRST2 Pan-genome LS-BSR

Quality Control: Questions to Ask

• Did my sequencing work?

• What are the fragment lengths?

• Is my sample what I think it is?

• Is my sample contaminated?

Did my sequencing work?

• FastQC:

What are the fragment lengths?

• Qualimap (or just BWA)

BadFragment length < read

length

OKFragment length > read

length

GoodFragment length > 2x read

length

Will affect: genome coverage, de novo assembly performance, alignment performance

Is my sample what I think it is?

• BLASTing a few reads usually very efficient

Is my sample contaminated?

Adaptor trim reads

• With Nextera libraries, failing to adaptor trim will KILL your assemblies.

• Particularly important when mean fragment length < read length.

• Many trimmers available: I like to use Trimmomatic

For more explanation: http://nickloman.github.io/high-throughput%20sequencing/genomics/bioinformatics/2013/04/17/adaptor-trim-or-die-experiences-with-nextera-libraries/

Adaptor trim reads

• With Nextera libraries, failing to adaptor trim will KILL your assemblies.

• Particularly important when mean fragment length < read length.

• Many trimmers available: I like to use Trimmomatic

For more explanation: http://nickloman.github.io/high-throughput%20sequencing/genomics/bioinformatics/2013/04/17/adaptor-trim-or-die-experiences-with-nextera-libraries/

Reference-based or de novo?

Reference-based or de novo?

• Reference-based– Implies ALIGNMENT to reference

– Implies you HAVE a reference

– Allows exquisitely sensitive and specific SNP calling (forensic SNP calling to single mutation precision)

– Important for looking at CHAINS OF TRANSMISSION

– Can only call in parts of the genome COMMON between your SAMPLES and REFERENCE

Reference-based or de novo?

• De-novo– Implies de novo assembly

– Does NOT require a reference

– Gives access to the entire PAN-genome

– E.g.• Unexpected antibiotic resistance genes

• Virulence factors

– Can give misleading results in REPEAT sequences

– Not suitable for very fine-resolution SNP analysis

In practice

• Most people will want to do both.

• And if you have no reference, you can use a draft de novo assembly AS your reference.

Acknowledgements

• Twitter comments:

– Tom Connor, Alan McNally, Torsten Seemann, C. Titus Brown, Heng Li, Christoffer Flensburg, Matt MacManes, Rachel Glover, Willem van Schaik