+ All Categories
Home > Technology > Desktop as a Service supporting Environmental ‘omics

Desktop as a Service supporting Environmental ‘omics

Date post: 13-Jan-2017
Category:
Upload: david-wallom
View: 331 times
Download: 0 times
Share this document with a friend
16
Desktop as a Service supporting Environmental ‘omics David C H Wallom 1 , Timothy Booth 2 , Andy Bowery 1 , Ben Collier 1 , Dawn Field 1 , Philip Kershaw 3 , Anurag Priyam 4 and Yannick Wurm 4 1 Oxford e-Research Centre, University of Oxford, UK 2 Centre for Ecology and Hydrology (CEH), Wallingford, UK 3 Centre for Environmental Data Analysis (CEDA), STFC, UK 4 School of Biological and Chemical Sciences, Queen Mary University of London, UK
Transcript
Page 1: Desktop as a Service supporting Environmental ‘omics

Desktop as a Service supporting Environmental ‘omics

David C H Wallom1, Timothy Booth2, Andy Bowery1, Ben Collier1, Dawn Field1, Philip Kershaw3, Anurag Priyam4 and Yannick Wurm4

1Oxford e-Research Centre, University of Oxford, UK2Centre for Ecology and Hydrology (CEH), Wallingford, UK3Centre for Environmental Data Analysis (CEDA), STFC, UK

4School of Biological and Chemical Sciences, Queen Mary University of London, UK

Page 2: Desktop as a Service supporting Environmental ‘omics

Overview

• Bio-Linux & Docker, tools for Bioinformaticians• Why cloud?• The underpinning infrastructure• EOS Cloud Project• VM deployment and capability Boost• Adapting Docker to ease its use• Conclusions

Page 3: Desktop as a Service supporting Environmental ‘omics

Bio-Linux: A scalable solution • Comprehensive, free bioinformatics workstation based on Ubuntu

Linux and Debian Med

• 11 years & 8 major releases

• Around 8000 users from 1600 locations

• 200+ bioinf packages including big integrative tools :- QIIME, Galaxy Server, PredictProtein, EMBOSS, ...Incorporates all software

Dual BootLinux Live Local Servers Cloud

Page 4: Desktop as a Service supporting Environmental ‘omics

Docker, simplifying the portability of applications and services

Page 5: Desktop as a Service supporting Environmental ‘omics

Bioinformatics software challenges• This brings a onslaught of new

challenges for bioinformatics:– projects that used to require teams

of 500 are now accessible to small teams

– but biology curricula (i.e. biologists) still lack computational skills.

– thus biologists are overwhelmed by large amounts of data

– furthermore data types are young - so software is young, thus • software may be badly built (by biologists

with no formal software dev training/xp).• software needs to be frequently updated

(bugfixes, algorithmic improvements (sensitivity/specificity), new data type support).

changes everything for biology

Page 6: Desktop as a Service supporting Environmental ‘omics

Why Cloud?• Data sets can be too big or restricted to easily move – move the compute to the data– Researcher work patterns are maintained

• Tools such as Bio-Linux are community enablers• More efficient use of shared resources• Central maintenance of infrastructure• Lower barrier to entry (Compared to traditional

HPC and Grid)

Page 7: Desktop as a Service supporting Environmental ‘omics

External Network inside JASMIN

Unmanaged Cloud – IaaS, PaaS, SaaS

JASMIN Internal Network

Panasas storage

Lotus Batch Compute

JASMIN Cloud Architecture

Standard Remote Access Protocols – ftp, http, …

Managed Cloud - PaaS, SaaS

JASMIN Analysis Platform

VM

Project1-orgScience Analysis

VM 0

Science Analysis

VM 0Science Analysis

VM

JASMIN Cloud Management Interfaces

Direct File System Access

Direct access to batch processing

cluster

Appliance Catalogue

Firewall + NAT

Firewall

optirad-org

Science Analysis

VM 0Science Analysis

VM 0

IPython Slave VM

File Server VM

IPython JupyterHub VM

eos-cloud-org

Science Analysis

VM 0

Science Analysis VM

0

EOSCloud VM File Server

VM

EOSCloudFat Node

IPython Notebook VM with access cluster through IPython.parallel EOSCoud Desktop as a Service

with dynamic RAM boost

Appliance Catalogue

Appliance Catalogue

Firewall + NAT Firewall + NAT

Firewall

Page 8: Desktop as a Service supporting Environmental ‘omics

EOS Cloud

• A tenancy in the JASMIN Unmanaged Cloud• Web interfaces based on JASMIN custom IaaS software

platform• ‘Users’ or VMAdmin are registered JASMIN users• Each receives two VMs

– Bio-Linux– Ubuntu Docker hosting environment

• Users with total responsibility for instantiated system• Accessible though standard remote desktop tools

Page 9: Desktop as a Service supporting Environmental ‘omics
Page 10: Desktop as a Service supporting Environmental ‘omics

Boosting Resource Capabilities• A resource permanently scaled to support the heaviest workload would be a waste

– Can we scale the users virtual services to take demand into account?• Users VMs startup and operate in native state ‘Standard’

– Enough capability to access stored data– Configure applications and workflows– ‘Free’

• User may boost his running VM to increased capability– Enough to run installed Bio-Linux analysis applications on useful timescale– Credit consumption only for Boosted instances

• Reference datasets available to users through shared storage

Name # Core Memory (GB) Cost(Credit/hour)

Standard 1 16 0

Standard+ 2 40 1

Big 8 140 4

Max 16 500 12

Page 11: Desktop as a Service supporting Environmental ‘omics
Page 12: Desktop as a Service supporting Environmental ‘omics

oSwitchOne-line access to other operating systems.

• Docker applications though portable can feel extremely alien in their usability

• With oSwitch in contrast things feel (largely) unchanged:

– Current working directory is maintained.

– User name, uid and gid are maintained.

– Login shell (bash/zsh/fish) is maintained.

– Home directory is maintained (thus all .dotfiles and config files are maintained).

– read/write permissions are maintained.

– Paths are maintained whenever possible. Thus volumes (external drives, NAS, USB) mounted on the host are available in the container at the same path.

https://github.com/yeban/oswitch

Page 13: Desktop as a Service supporting Environmental ‘omics

Pilot Users

• CEH Bioinformaticians using the EOS Cloud to study patterns in microbial biodiversity

• Genomic and transcriptomic data from fish toxicogenomics studies at Exeter

© USC

© Wikimedia Commons

Page 14: Desktop as a Service supporting Environmental ‘omics

Pilot Users

• Creating compute containers for each OSD in silico analysis – Portable

• Run same analysis on different laptops/grids/clouds– Repeatable/Reproducible

• Same input gives same output given that reference databases did not change– Preservation

• All analysis tools and dependencies are in one image• Images are simple tar.gz • Preserving Docker and base images is preserving all analysis

Page 15: Desktop as a Service supporting Environmental ‘omics

Desktop as a Service for research

Page 16: Desktop as a Service supporting Environmental ‘omics

THANK YOU AND QUESTIONS?

https://eoscloud.nerc.ac.ukhttps://github.com/environmentalomicshttps://github.com/wurmlab/oswitch


Recommended