+ All Categories
Home > Science > Coding & Best Practice in Programming in the NGS era

Coding & Best Practice in Programming in the NGS era

Date post: 16-Apr-2017
Category:
Upload: lex-nederbragt
View: 3,197 times
Download: 1 times
Share this document with a friend
56
Coding & Best Practice in Programming Why it matters so much in the NGS era Lex Nederbragt Norwegian Sequencing Centre and Centre for Evolutionary and Ecological Synthesis [email protected] @lexnederbragt OK
Transcript
Page 1: Coding & Best Practice in Programming in the NGS era

Coding & Best Practice in ProgrammingWhy it matters so much in the NGS era

Lex Nederbragt Norwegian Sequencing Centre and

Centre for Evolutionary and Ecological [email protected]

@lexnederbragt

OK

Page 2: Coding & Best Practice in Programming in the NGS era

Who am I

@lexnederbragt flxlexblog.wordpress.com

Page 3: Coding & Best Practice in Programming in the NGS era

How I became a bioinformatician

Page 4: Coding & Best Practice in Programming in the NGS era

2007: a grant

GS FLX from Roche/454 Genome Analyzer from Solexa/Illumina

?

Let’s try them out!

Page 5: Coding & Best Practice in Programming in the NGS era

Specimen

• Planktothrix rubescens NIVA CYA 98

• Cyanobacteria

• (blue-green algae)

Page 6: Coding & Best Practice in Programming in the NGS era

Planktothrix

Half a million readsAverage length 260 nt

10 million reads33 nucleotides each

Perl

Page 7: Coding & Best Practice in Programming in the NGS era

Planktothrix

Newbler SHARCGS

Assembly

Half a million readsAverage length 260 nt

10 million reads33 nucleotides each

Page 8: Coding & Best Practice in Programming in the NGS era

Atlantic cod genome project

850 million bases (Mbp )‘Wild-caught’

GS FLX from Roche/454

Page 9: Coding & Best Practice in Programming in the NGS era

Atlantic cod genome project phase 1

Page 10: Coding & Best Practice in Programming in the NGS era

Cod genome project phase 2

From Wikimedia commons, user Sagar Joshi

Page 11: Coding & Best Practice in Programming in the NGS era

In summary

From flickr, user lesterpubliclibrary

Page 12: Coding & Best Practice in Programming in the NGS era

Challenges in the next-generation sequencing era

Page 13: Coding & Best Practice in Programming in the NGS era

High-throughput sequencing

Phase 1: more is better

Phase 2: smaller is better

Phase 3: single-molecule

Phase 4: nanopores

Page 14: Coding & Best Practice in Programming in the NGS era

Democratization of sequencing

MinION

512 nanopores 150mb/hour

Up to 6 hours$900

Page 15: Coding & Best Practice in Programming in the NGS era

Sequencing cost

Thanks to Matt Clark (TGAC), modified from http://bit.ly/1iiajcS

454 &polony Solexa

&SOLiD

HiSeq HiSeq X Ten

GAII

End of the gold rush?

Page 16: Coding & Best Practice in Programming in the NGS era

More more more

Data Software

Mathias Bigge, Ricordisamoa, others (wikimedia commons)

TCTCCTAACAACCCCCcACACACACACACTGGTACTGATGCCATTCTGCTTTACACCTATACACATCATATACATtATACACACACACACACACACACAACACTCTCCTAACCCACACACACTGGTACAGATGCCAGTCTGCTTAACACCTACGCACGTATTATACACACACACACACAACGCTCTCCTAACCCACACACACACCAGTCTGCTTTAAACCTACACACATATTATACAAACGAGTTGGTGACGTAAGGTTGATAAGGGATATTGGTAAGGGTTAAGGGTAGGGTTGGTGTTAGGGGCAAGGGTTAGGGTTAGTGTAAGGGGTAAGGGTTAGTGTAaGGAGTAAGGGTTAGTGTAAGGGGTTAGTGTTATTGTAAGGGGCTAGTGTTAGTGTTAGTGTTCAGGGTTAGTGTTAGGGGTAGGGTTAATgTTTAGGGTAATGTTTAGGGTTAGGGGTATGGGTTAGTGCTAGGGGTCAGGGTTAGTGTTAGGGTTAGACAACCCACCTGAGAGAACCAGTGCGATGCCGCCGCAGGCGTTGGGCGAGGACATGGAGGTGCCGTTCATCAGCTGGGTCCCCCGGAGGGTCCAGTTGGGGACGGAGGCGATGGCTCCCCCCGGAGCGCTGATGCTGACCCCCAGGGCGCCGTCGATGCTGGGTCCCCGAGACGACCAGGTGTACTGGTTGGCCGGGAGCTTCTCCCTCAGGGAGTACTCCGCCACCATCATGTCGGGGGTCACGTAGGCCCCAACCCCTGGGGACAGACGGAGCGCGTTACACACCTCAACCCCTTACCCTCGGAGCCTACATAACCCAACCCTCTGGAGACGGCAATGCTTGCATAGTCAGAAATAGaGCTGACCGATTCATCAAATTCAAACGTCATCGCTATATAATAGCGGGgTTTGATTTGCCATTTGCAAATTGCAAAGGCTGCAATgtttttttttttt

Page 17: Coding & Best Practice in Programming in the NGS era

Software

Constant stream of new software

http://wwwdev.ebi.ac.uk/fg/hts_mappers

88 short-read mappers

Page 18: Coding & Best Practice in Programming in the NGS era

Software

Constant stream of new software

http://neidetcher.com/ubuntu_package_dependency.html

InstallationJudging quality

Wikimedia commons, user Thebestofall007

Page 19: Coding & Best Practice in Programming in the NGS era

Do we need to be worried?

Page 20: Coding & Best Practice in Programming in the NGS era

Do we need to be worried?

Self-taught bioinformaticians

TCTCCTAACAACCCCCcACACACACACACTGGTACTGATGCCATTCTGCTTTACACCTATACACATCATATACATtATACACACACACACACACACACAACACTCTCCTAACCCACACACACTGGTACAGATGCCAGTCTGCTTAACACCTACGCACGTATTATACACACACACACACAACGCTCTCCTAACCCACACACACACCAGTCTGCTTTAAACCTACACACATATTATACAAACGAGTTGGTGACGTAAGGTTGATAAGGGATATTGGTAAGGGTTAAGGGTAGGGTTGGTGTTAGGGGCAAGGGTTAGGGTTAGTGTAAGGGGTAAGGGTTAGTGTAaGGAGTAAGGGTTAGTGTAAGGGGTTAGTGTTATTGTAAGGGGCTAGTGTTAGTGTTAGTGTTCAGGGTTAGTGTTAGGGGTAGGGTTAATgTTTAGGGTAATGTTTAGGGTTAGGGGTATGGGTTAGTGCTAGGGGTCAGGGTTAGTGTTAGGGTTAGACAACCCACCTGAGAGAACCAGTGCGATGCCGCCGCAGGCGTTGGGCGAGGACATGGAGGTGCCGTTCATCAGCTGGGTCCCCCGGAGGGTCCAGTTGGGGACGGAGGCGATGGCTCCCCCCGGAGCGCTGATGCTGACCCCCAGGGCGCCGTCGATGCTGGGTCCCCGAGACGACCAGGTGTACTGGTTGGCCGGGAGCTTCTCCCTCAGGGAGTACTCCGCCACCATCATGTCGGGGGTCACGTAGGCCCCAACCCCTGGGGACAGACGGAGCGCGTTACACACCTCAACCCCTTACCCTCGGAGCCTACATAACCCAACCCTCTGGAGACGGCAATGCTTGCATAGTCAGAAATAGaGCTGACCGATTCATCAAATTCAAACGTCATCGCTATATAATAGCGGGgTTTGATTTGCCATTTGCAAATTGCAAAGGCTGCAATgtttttttttttt

lot’s of data

lot’s of software

recipe for disaster?

Page 21: Coding & Best Practice in Programming in the NGS era

Correctness of results

http://www.it.bton.ac.uk/staff/je/java/jewl/tutorial/tutorial.html

Page 22: Coding & Best Practice in Programming in the NGS era

Reproducibility

doi:10.1038/sj.embor.7401143

A reproducibility crisis?

Page 23: Coding & Best Practice in Programming in the NGS era

Reproducibility and reusability

http://upload.wikimedia.org/wikipedia/commons/4/48/Recycle.jpg

Page 24: Coding & Best Practice in Programming in the NGS era

What it boils down to

TRUST

Page 25: Coding & Best Practice in Programming in the NGS era

My (given) title

Coding & Best Practice in ProgrammingWhy it matters so much in the NGS era

Why it matters so much in science

Next-generation sequencing specific?

Page 26: Coding & Best Practice in Programming in the NGS era

Diagnostic sequencing

Wikimedia commons, user Bill Branson

Page 27: Coding & Best Practice in Programming in the NGS era

Diagnostic sequencing

Page 28: Coding & Best Practice in Programming in the NGS era

Diagnostic sequencing

Page 29: Coding & Best Practice in Programming in the NGS era

Solutions

Page 30: Coding & Best Practice in Programming in the NGS era

Solutions

Flickr: http://farm4.staticflickr.com/3319/3265787219_bfbc654b5e_o.jpg Wikimedia commons

Page 31: Coding & Best Practice in Programming in the NGS era

Best practices

10.1371/journal.pbio.1001745

Page 32: Coding & Best Practice in Programming in the NGS era

Best practices

Automate repetitive tasks

Wikimedia commons, user Pzucchel

Page 33: Coding & Best Practice in Programming in the NGS era

Best practices

Coding styles, variable naming etc

def test_seq:

def sequence_is_DNA:

Page 34: Coding & Best Practice in Programming in the NGS era

Best practices

Use version control

https://www.atlassian.com/git/workflows

Page 35: Coding & Best Practice in Programming in the NGS era

Best practices

From my own work:

$ cd scripts$ lsblat_parse4.pl old_versions snps_flanks_2_fastq.pl

$ ls old_versions/blat_parse2.pl blat_parse_attemp1.pl blat_parse.pl.bak blat_parse.plblat_parse3_backup.plblat_parse3.pl

Page 36: Coding & Best Practice in Programming in the NGS era

Best practices

test, test, test

def test_zero:assert run_the_function(0) == 0

Assert x > 0, ”cannot handle negative numbers"

Page 37: Coding & Best Practice in Programming in the NGS era

Best practices

Document well

Page 38: Coding & Best Practice in Programming in the NGS era

Best practices

Collaborate

http://howdoitradestocks.com/wp-content/uploads/2011/12/share-ideas1.jpg

Page 39: Coding & Best Practice in Programming in the NGS era

khmer, a 'case study'

Page 40: Coding & Best Practice in Programming in the NGS era

khmer

Crusoe et al. doi: 10.6084/m9.figshare.979190MichaelCrusoe

TitusBrown

Page 41: Coding & Best Practice in Programming in the NGS era

khmer

https://github.com/ged-lab/2013-paper-ssspe

Page 42: Coding & Best Practice in Programming in the NGS era

khmer

Integrated code coverage analysis

The “GitHub Flow” model of code review

Semantic versioning

Continuousintegration Integration and

acceptance testing

Page 43: Coding & Best Practice in Programming in the NGS era

Beyond best coding practices

Page 44: Coding & Best Practice in Programming in the NGS era

Benchmarks

http://assemblathon.org/

Page 45: Coding & Best Practice in Programming in the NGS era

Benchmarks

http://www.genome.org/cgi/doi/10.1101/gr.131383.111

Page 46: Coding & Best Practice in Programming in the NGS era

Benchmarks

http://www.genomeinabottle.org/

~8300 10ug vials of DNA for NA12878

Page 47: Coding & Best Practice in Programming in the NGS era

(Assembly) validation

Page 48: Coding & Best Practice in Programming in the NGS era

(Assembly) validation

Assembly

doi:10.1186/1471-2105-15-126

Page 49: Coding & Best Practice in Programming in the NGS era

Reproducibility ‘platforms’

usegalaxy.org

taverna.org.uk/

pythonhosted.org/Sumatra/

Page 50: Coding & Best Practice in Programming in the NGS era

Action points

Page 51: Coding & Best Practice in Programming in the NGS era

Action points

Attend a software Carpentry Boot Camp

http://software-carpentry.org/

Page 52: Coding & Best Practice in Programming in the NGS era

Action points

Look for signs of best practice

Page 53: Coding & Best Practice in Programming in the NGS era

Action points

Look for signs of best practice

during peer review

nature.com

Page 54: Coding & Best Practice in Programming in the NGS era

Action points

Benchmarking/validation

Page 55: Coding & Best Practice in Programming in the NGS era

Action points

Develop (under)graduate curriculum

Page 56: Coding & Best Practice in Programming in the NGS era

My goal today

Flickr: http://farm4.staticflickr.com/3319/3265787219_bfbc654b5e_o.jpg


Recommended