funded by the National Science FoundationAward #ACI-1445604
Resource Management from HPC to the Cloud:Do you manage resources or do they manage you?David Hancock – [email protected] Manager – Indiana University
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Overview
• View from the border• IU background• Jetstream highlights• Optimizing for your problem• Reservations & Queueing • Challenges & Opportunities• Futures
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
View from the Border
• University vs National Center• Traditional HPC vs Cloud• Small vs Large Scale• Lab vs Central Computing• Working Sessions vs Lunch
IU – Campuses and Medical School Centers
IU Campuses IU School of Medicine campuses and clinics
IU goals
– To be one of the great public universities of the 21st Century (Michael A. McRobbie, 18th
President of IU)– To be a leader, “in absolute terms for uses and
applications of IT” (Myles Brand, 16th President of IU)
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
What is Jetstream?
• A resource to expand the community of users who benefit from NSF investment in shared cyberinfrastructure
• Production cloud system supporting all domains of science and engineering research sponsored by the NSF
• Provide on-demand interactive computing and analysis• Enable configurable environments and architectures• Support computational reproducibility and sharing• Democratizes access to cloud-native technology and software• Focuses on ease of use, but also on maintaining flexibility
Expanding NSF XD’s reach and impact
Around 350,000 researchers, educators, & learners received NSF support in 2015
– Less than 2% completed a computation, data analysis, or visualization task on XD program resources
– Less than 4% had an XSEDE Portal account– 70% of researchers surveyed* claimed to be resource
constrainedWhy aren’t they using XD systems?
– Activation energy is pretty high– HPC resources are scarce and not well-matched to their needs– They just don’t need that much capability
* https://www.xsede.org/xsede-nsf-release-cloud-survey-report
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Expanding NSF XD’s reach and impact
Capability class machinesTraditional HPC, HTC systems
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Systems Overview
Platform Overview
Atmosphere*APIGlobus*Auth
Atmo Services XSEDE*Accounting
OpenStack CEPH
Indiana*University
OpenStack CEPH
TACC
OpenStack CEPH
Potentially,+Others
Web*App
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
What do you optimize for?
• HPC– Utilization– Capability or Capacity Science – Checkpoint/Restart I/O– Memory/Network Bandwidth & Latency
• Cloud– Availability – Multi-level API Interactions– On-demand/Interactive Use– Using Commodity Components
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Reservations & Queueing
• HPC– Staples of the HPC world with powerful tools (e.g. Moab/Slurm)– Decades of expertise and tuning– Condo computing “anti-batch”
• Cloud– No reservations, no queueing, refocus
• Some opposition to these concepts– Reserved instances “anti-cloud”– However… factions in OS community
still pushing for do what AWS does
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Opportunities & Challenges
• Opportunities– Serving an unmet need with immense & intense interest– Affordable HA– Satisfying users’ visions (SUNY & Galaxy)
• Challenges– Need “cloud-washing” for users/staff
• What, no parallel file system? – Logs are verbose and cryptic– Rapid development cycle
• Quickly deprecate functionality• Undocumented change
– Public IPv4 addresses
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
What would we change?
• Names of our auth domains• Clearly separate Atmosphere & Native OS domains• More storage capacity, a catch-22• Private IP support from day-1• Easy-button access from day-1• Consider host aggregates with restrictions for reservations• Ubuntu w/lightweight packaging, no RDO• Mad cluster as default…
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Happy Cluster – Mad Cluster
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
What would we do the same?
• Ceph (block/object)• Use latest OS release (Liberty, testing Mitaka)• Deliver test cluster early• Use VXLAN, Intel XL710 adapters (no TSO)• Dell equipment & F10 switches working well• Distributed partnership• Limit site dependencies• Use SaltStack
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
What comes next?
• Both sites have all required software components installed, configured, and operational
• Transitioning to full operations post-acceptance review• Early June 2016: 57 XSEDE projects and 250+ users• Soliciting Research allocation requests NOW plus Startup and
Education allocations• Adding services as deemed useful/mature (heat, ceilometer,
magnum, trove, etc)• Atmosphere enhancements
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
OpenStack Magnum and Container Orchestration Engines
Complete management for containers within OpenStack
Support several container orchestration engines
– Docker Swarm– Google Kubernetes– Apache Mesos
Allows direct access to native container APIs– Docker CLI clients can access hosts and containers– The Kubernetes client can also directly manage pods, services, etc.
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Things I left behind…
• Details on XSEDE/NSF XD Program• Software block diagram • Detailed specs• Detailed topology• VM sizing• Security• State of OpenStack• Operational tools• Live Demo
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
How can I use Jetstream or learn more?
• An XSEDE User Portal (XUP) account is required. Get one at https://portal.xsede.org– Read the Allocations Overview -
https://portal.xsede.org/allocations-overview– Submit a Startup or Education request -
https://portal.xsede.org/successful-requests• Wiki: http://wiki.jetstream-cloud.org• User guides: https://portal.xsede.org/user-guides• Training Videos & Virtual Workshops (TBD)
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
PartnersConstruction
Application / Community LeadsManagement & Operations
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Credit & Thanks
• Mike Lowe, Bret Hammond, George Turner, Jeremy Fischer, Craig Stewart (IU)
• Matt Vaughn, Mike Packard (TACC)• Paul Rad (UTSA/Rackspace)• Univ of Arizona CyVerse Team led by Nirav Merchant • James Taylor (JHU)• OS Summit Presentation: Deploying OpenStack for The National Science
Foundation's Newest Supercomputers Lowe, J.M.; Budden, Robert: http://hdl.handle.net/2022/20824
• SC16 Panel - Thursday November 17th @3:30 PM – HPC/Research Computing leveraging the architectures, flexibilities and tools
emerging from the members of the OpenStack Scientific Community
Funded by the National Science FoundationAward #ACI-1445604
http://jetstream-cloud.org/
Questions? [email protected] website: http://jetstream-cloud.org/License)Terms• Jetstream is supported by NSF award 1445604 (Craig Stewart, IU, PI)• XSEDE is supported by NSF award 1053575 (John Towns, UIUC, PI)• This research was supported in part by the Indiana University Pervasive Technology Institute, which was
established with the assistance of a major award from the Lilly Endowment, Inc. Opinions presented here are those of the author(s) and do not necessarily represent the views of the NSF, IUPTI, IU, or the Lilly Endowment, Inc.
• Items indicated with a © are under copyright and used here with permission. Such items may not be reused without permission from the holder of copyright except where license terms noted on a slide permit reuse.
• Except where otherwise noted, contents of this presentation are copyright 2016 by the Trustees of Indiana University.
• This document is released under the Creative Commons Attribution 3.0 Unported license (http://creativecommons.org/licenses/by/3.0/). This license includes the following terms: You are free to share – to copy, distribute and transmit the work and to remix – to adapt the work under the following conditions: attribution – you must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or your use of the work). For any reuse or distribution, you must make clear to others the license terms of this work.