+ All Categories
Home > Science > Joe parker lightweight_bioinformatics

Joe parker lightweight_bioinformatics

Date post: 06-Aug-2015
Category:
Upload: joe-parker
View: 167 times
Download: 0 times
Share this document with a friend
Popular Tags:
17
Omics in extreme Environments (Lightweight bioinformatics) Joe Parker Royal Botanic Gardens, Kew
Transcript

Omics in extreme Environments (Lightweight bioinformatics) !

Joe Parker"Royal Botanic Gardens, Kew"

Compute time is (much) cheaper than you think""… and much cheaper than your time."

Physical portability requires software portability."

Kew"

One of the largest living and tissue collections in the world: ca ~6000 genera (~1/3 plant genera)"

2020 Strateigic Output: Plant And Fungal Trees Of Life!

Why in the field"•  Spatial analysis"•  ID & naming"•  Image recognition"

Why in the field"

The ‘micro’ computer: Raspi"•  Low: cost, energy (& power)!•  Highly portable"•  Hackable form-factor"

Laptops"•  Portable"•  Very costly form-factor"•  Maté? Beer?"

Clusters"•  Not portable,

setup costs"

The cloud"•  Power closely linked to

budget (as limited as)"•  Almost infinitely

scalable"

•  Have to have a connection to get data up there (and down!)"

•  Fiddly setup"

Comparison"

System Arch CPU type,

clock GHz cores RAM Gb / MHz / type

HDD Gb

Pandanus i686 Xeon E5620 @ 2.4 4 33 1000

@ SATA

Raspberry Pi 2 B+ ARM ARMv7

@ 1.0 1 1 8 @ flash card

Macbook Pro (2011) x64 Core i7

@ 2.2 4 8 250 @ SSD

EC2 m4.10xlarge x64 Xeon E5

@ 2.4 40 160 320 @ SSD

MidPlus x64 Westmere @ 2.8 2500+ 24 - 512 2x320

@ SSD

Workflow"Setup

BLAST 2.2.30

CEGMA genes

Short reads

Concatenate hits to CEGMA alignments

Muscle 3.8.31

RAxML 7.2.8+

Set up workflow, binaries, and reference / alignment data. Deploy to machines.

Protein-protein blast reads (from MG-RAST repository, Bass Strait oil field) against 458 core eukaryote genes from CEGMA. Keep only top hits. Use max. num_threads available.

Append top hit sequences to CEGMA alignments.

For each:

Align in MUSCLE using default parameters

Infer de novo phylogeny in RAxML under Dayhoff, random starting tree and max. PTHREADS.

Output and parse times.

Results"Platform! Hardware capital! Data! Running!

Pi (new) " £26" NA" NA"MBP" ~£2000" NA" NA"

AWS (M4.10xlarge)" 0" ~£1/Mb (BGAN); £1/day (Virgin mobile-only tariff)" £1.78 /hr"

AWS (t2.micro)" 0" ~£1/Mb (BGAN); £1/day (Virgin mobile-only tariff)" £0.01/hr"

log

(tim

e) (u

ser,

s)

log(Number of queries)

Overall performance"

Raspi in practice"•  Stability"•  ARM not x86

architecture"•  2 GB RAM… "

The cloud in practice"•  Fiddly setup, easy to

replicate"•  Need a connection to

get data up there (and down!)"

Conclusions"•  Pi opportunities but not there yet, also you’ll

still need a connection unless you’re very lucky.. "•  Installation in situ?"

•  Consider cloud computing (connections can only improve)"

•  Portability of the workflow enhances portability of the system!– …which you should be embracing anyway for

reproducibility…"

Thanks"!Kew!!Matt Blissett, Abigail Barker, Rob Turner!

Others!!Daniel Barker (4273π)"!Tim Booth (BioLinux)""Alexandros Stamatakis (RAxML)"


Recommended