Post on 07-Aug-2015
transcript
1
Utilising Cloud Computing for research through Infrastructure, Software and Desktop as a Service’Dr David Wallom
Associate Director
2
Overview
• Infrastructure as a Service– The EGI Federated Cloud
• Software as a Service– Hub
• Desktop As A Service– EOSCloud
www.egi.euEGI-InSPIRE RI-261323
The Federated CloudA federation of Cloud resources from the public, academic and private sectors, offering Cloud Services to all research communities
A ‘single’ cloud system to;• scale• integrate multiple providers irrespective of technology• target the research community
Standards based federation of IaaS cloud: • Exposing a set of independent cloud services through a common standards
profile• Allowing deployment of services across multiple providers and capacity
bursting• Building on world class EGI core services already proven
www.egi.euEGI-InSPIRE RI-261323
Usage Model• Total control over
deployed applications • Elastic resource
consumption based on real needs
• Workloads processed on-demand
• Endorsed and accredited applications available from multiple different communities shared
• Single sign-on at multiple, independent providers
• Centralised access to service information across multiple providers
VM Operator
Resource Provider
www.egi.euEGI-InSPIRE RI-261323
EGI Federated Cloud
6
EGI Core PlatformCloud Infrastructure Platform
Collaboration Platform
Monitoring and control of utilisation
Technical Consultancy and Support
Uniform interfaces to Cloud Compute and
Storage
Secure endorsed Application and Service
Deployment
User Community
Consumer VM-OperatorCommunity
Management All
www.egi.euEGI-InSPIRE RI-261323
EGI Cloud Infrastructure
7
EGI Core Platform
Federated AAIServiceRegistry Monitoring Accounting
EGI Cloud Infrastructure Platform
Instance Mgmt
Information Discovery
Storage Management
Help and Support
Security Co-ordination
Training and Outreach
EGI C
olla
bora
tion
Tool
s
EGI A
pplic
ation
D
BIm
age
Repo
sito
ryEG
I Clo
ud S
ervi
ce M
arke
tpla
ce
Sustainable Business Models
User Community
Monitoring and control of utilisation
Technical Consultancy and Support
Uniform interfaces to Cloud Compute and Storage
Cloud Management Stacks(OpenStack, OpenNebula, Synnefo, …)
Secu
re e
ndor
sed
Appl
icati
on
and
Serv
ice
Dep
loym
ent
GSIGLUE2
Cloudinit CDMI
SAM UR
OVF
OCCI
www.egi.euEGI-InSPIRE RI-261323
Partnership
Resources– 21 certified resource providers from 13
Countries– 9 resources in certification process– Worldwide interest & integration
• Australia* (NeCTAR)• South Africa* (SAGrid)• South Korea* (KISTI)• United States* (NIST, NSF A.C. Centres)
– Technology• 12 x Openstack• 7 x Open Nebula• 1 x Syneffo• 1 x Emotive
* Not shown on map
9
www.egi.euEGI-InSPIRE RI-261323
EGIs Appliance Catalogue
• EGIs ‘App Store’• 30 Registered Virtual Appliances• 21 Supporting Sites• 9 Supported Virtual
Organizations• atlas, • biomed, • cms, • demo.fedcloud.egi.eu, • drihm.eu, • fedcloud.egi.eu, • highthroughputseq.egi.eu, • lhcb, • vo.chain-project.eu
10
www.egi.euEGI-InSPIRE RI-261323
DRIHM
11
• Scientific Discipline: Natural Science, Earth sciences, Hydrology • Status: Test & Integration (drihm.eu VO)
DRIHM in the EGI FedCloud:• Running various hydrological models in the
EGI Federated Cloud• 1 VM: 1 cores, 4/8 GB of RAM• few GB of storage• Windows OS• Contextualisation for Windows OS VM image• Licence issue
DRIHM:• project funded by EC aiming at providing an open, fully
integrated workflow platform for predicting, managing and mitigating the risks related to extreme weather phenomena.
www.egi.euEGI-InSPIRE RI-261323
Chipster
12
Chipster in the EGI FedCloud:• ‘light’ VM (datasets removed)• Chipster VM configured through
contextualisation• shared block storage exported as
NFS for tools (500 GB)• block storage for output (500 GB)
• Scientific Discipline: Natural Science, Biological Sciences, Bioinformatics• Status: Production
ELIXIR Pilot Action Proposal:Using virtual machines and clouds in bioinformatics training
User-friendly analysis software for high-throughput data:• NGS• Microarray• Proteomics• sequence data
www.egi.euEGI-InSPIRE RI-261323
Use Case Discipline Classification
13
Usage since launch>600k VMs>40M CPU hours
Usecases- 12 @ Launch- 60 to date
- 11 production
15
Initial Experimental
Idea
Experimental Design
Data CollectionAnalysis
Publication
• Use open source components to build comprehensive
LIMS
• Support all parts of the research lifecycle
• Integrate 3rd party services to increase value and
capability
Architecture
Digital microscope
Electro-physiology
rig
Digital pens
PCWorkflows
Drupal
Continuous integration
AlfrescoGoogle
Services
Search Metadata AAI
Thoughts on Hub
• People are increasingly transient– Stop loosing the unknown - knowns
• Living data is often the forgotten component in data management
• All data will be born digital
• Data management requirements mean responsibility for storage of raw data is increasingly important
• Laboratory equipment can directly record into Hub to ensure data management from birth to death
• Connecting all of the experiment to ensure institutional knowledge capture, – neat and rough notes, raw data, analysis applications and output
Bio-Linux: A scalable solution • Comprehensive, free bioinformatics workstation based on
Ubuntu Linux
• 10 years & 8 major releases
• 200+ bioinf packages including big integrative tools :- QIIME, Galaxy Server, PredictProtein, EMBOSS, ...Incorporates all software
• >7000 users in >1600 locationsDual BootLinux Live Local Servers Cloud
Why Cloud?• Tools such as Bio-Linux are community enablers• Data sets can be too big or restricted to easily move
– move the compute to the data– Researcher work patterns are maintained
• Need more efficient use of shared resources• Central maintenance of infrastructure• Lower barrier to entry (Compared to traditional
HPC and Grid)
EOSCloud
• A NERC Big Data capital project• A tenancy in the STFC JASMIN Unmanaged Cloud• Each registered user receives two VMs
• Bio-Linux• Ubuntu Docker hosting environment
– With total responsibility for instantiated system– Accessible though standard remote desktop tools
• But, – utilising single scale of resources would be a waste– Can we scale the users virtual services to take into account demand?
Boosting Resource Capabilities• Users VMs operate in native state ‘Standard’
– Enough capability to access stored data– Configure applications and workflows– Free
• User may boost his running VM to increased capability
– Enough to run analysis applications on useful timescale
– Credit consumption only for Boosted instances
• Reference datasets available to users through shared storage
Name # Core Memory (GB) Cost(Credit/hour)
Standard 1 16 0
Standard+ 2 40 1
Big 8 140 4
Max 16 500 8
Desktop as a Service for research
• Giving researchers an environment they are confident in by changing the infrastructure around them
• Location independent persistence of research environments
• Launch for pilot user communities 31st Mar 2015– Moving beyond pilot user communities (e.g. Ocean
Sampling Day)
• Investigating other key usage models such as teaching or online learning
31
Conclusions
• Cloud is (obviously) an enabler for research– Allowing flexibility in infrastructure hitherto not possible– User control rather than provide control
• Its not just about infrastructure and not just about single cloud providers
• Cloud is a way of allowing higher level services to be made more easily and made accessible
• Open standards – allow a marketplace of services to develop– allows diverse resource providers to participate– Moves the value add from availability of service to quality of service