Date post: | 06-Aug-2015 |
Category: |
Science |
Upload: | joe-parker |
View: | 167 times |
Download: | 0 times |
Compute time is (much) cheaper than you think""… and much cheaper than your time."
Physical portability requires software portability."
Kew"
One of the largest living and tissue collections in the world: ca ~6000 genera (~1/3 plant genera)"
2020 Strateigic Output: Plant And Fungal Trees Of Life!
The cloud"• Power closely linked to
budget (as limited as)"• Almost infinitely
scalable"
• Have to have a connection to get data up there (and down!)"
• Fiddly setup"
Comparison"
System Arch CPU type,
clock GHz cores RAM Gb / MHz / type
HDD Gb
Pandanus i686 Xeon E5620 @ 2.4 4 33 1000
@ SATA
Raspberry Pi 2 B+ ARM ARMv7
@ 1.0 1 1 8 @ flash card
Macbook Pro (2011) x64 Core i7
@ 2.2 4 8 250 @ SSD
EC2 m4.10xlarge x64 Xeon E5
@ 2.4 40 160 320 @ SSD
MidPlus x64 Westmere @ 2.8 2500+ 24 - 512 2x320
@ SSD
Workflow"Setup
BLAST 2.2.30
CEGMA genes
Short reads
Concatenate hits to CEGMA alignments
Muscle 3.8.31
RAxML 7.2.8+
Set up workflow, binaries, and reference / alignment data. Deploy to machines.
Protein-protein blast reads (from MG-RAST repository, Bass Strait oil field) against 458 core eukaryote genes from CEGMA. Keep only top hits. Use max. num_threads available.
Append top hit sequences to CEGMA alignments.
For each:
Align in MUSCLE using default parameters
Infer de novo phylogeny in RAxML under Dayhoff, random starting tree and max. PTHREADS.
Output and parse times.
Results"Platform! Hardware capital! Data! Running!
Pi (new) " £26" NA" NA"MBP" ~£2000" NA" NA"
AWS (M4.10xlarge)" 0" ~£1/Mb (BGAN); £1/day (Virgin mobile-only tariff)" £1.78 /hr"
AWS (t2.micro)" 0" ~£1/Mb (BGAN); £1/day (Virgin mobile-only tariff)" £0.01/hr"
log
(tim
e) (u
ser,
s)
log(Number of queries)
The cloud in practice"• Fiddly setup, easy to
replicate"• Need a connection to
get data up there (and down!)"
Conclusions"• Pi opportunities but not there yet, also you’ll
still need a connection unless you’re very lucky.. "• Installation in situ?"
• Consider cloud computing (connections can only improve)"
• Portability of the workflow enhances portability of the system!– …which you should be embracing anyway for
reproducibility…"