Copyright © 2010 Platform Computing Corporation. All Rights Reserved. TORONTO 10/25/2011
Cloud Computing
for HPC Extending Clusters to Clouds
Solution Briefing
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 2
Company Background
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 3 3
Platform Computing, Inc.
The leader in cluster, grid and cloud management software:
o 19 years of profitable growth
o 2,000 of the world’s most demanding client organizations
o 5,000,000 CPUs under management
o 500 professionals working across 13 global centers
o Strategic relationships with Cray, Dell, Fujitsu, HP, IBM, Intel,
Microsoft, Red Hat and SAS
Platform
Clusters, Grids, Clouds
Computing
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 4 4
Product Leadership
Workload Management
Platform Computing
Clusters Grids
Resource Management
“We believe Platform ISF is perhaps the most complete internal
cloud software solution we’ve seen so far,” Staten says.
Clouds
Platform HPC
Platform MPI
Platform LSF
Platform Symphony
Platform ISF
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 5
Cloud Computing
for HPC
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 6 6
Key Trends
• Extending HPC to the cloud paradigm
• Management, Analysis, and Reporting
• Graphic Processing Units
• ISV Application Integration
HPC Market Trends
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 7
HPC Challenges
Users need:
• More grid resources to run
application faster
• Flexible resources to support
multiple application
• Lower cost per performance
IT needs:
• Contain costs without compromising
grid size and performance
• Grid data security
• Better meet their users needs
USERS IT
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 8
Optimally Making Use Of The Cloud
• Send workload to the cloud
o When workload queues exceed tolerable thresholds
for pending jobs
o When more capacity is required to meet SLAs
o To help small groups easily run their first cluster
enabled jobs
• Not all workload is suitable for the cloud, including:
o When data transfer required exceeds the acceptable wait time
o Data intensive computing applications
o When cloud resources become unreliable or
unavailable
o When privacy and/or security risks are too high
Cloud Computing Can Help
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 9
HPC Cloud - Differentiating the products
Cloud Connect On Demand
• LSF, HPC, Symphony plugin
• Schedules infrastructure in multi-tenant environments
• Provides for request fulfillment for sandboxing
• Creates external cloud connection from local clusters
• Accounts for infrastructure consumed
• Customer self service
• Pay-per-use
• Amazon EC2 only (no local)
• Web access only
• HPC Vertical specific (Life Sciences offering now avail)
• Applications pre-installed
• Platform HPC only
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 10
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 11
Problem #1: Connecting to the cloud
Problem
• Customer workloads are spiky
• Provisioning for peak is highly wasteful (utilization)
• Relying on desktops or existing servers wastes user time and can be very slow
Alternatives
• Take advantage of cloud resources by building their own solution
• Provision for peak (and live with the cost)
• Wait (wastes valuable engineering time, slow TTM)
Desired solution
• Provide a simple to use IaaS connection from an LSF Cluster
• Provide a simple policy engine to decide which jobs burst and which wait for local resources
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 12
Problem #2: Only use what is needed
Problem
• IaaS providers usually charge by the instance-hour. In short bursts, very cost effective. In long duration, expensive.
• Workload varies all the time. Cloud should only be used for peak demand
Alternatives
• Open Source products: OpenNebula, Nimbus
• Competitive products: AdaptiveCloud & Unicloud
• In-house ELIM integrated with IaaS APIs
Desired solution
• LSF/HPC/Symphony Plugin architecture
• Automated flexup/flexdown based on pending jobs and TTL for idle resources
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 13
Test/Dev Private
Cloud
Java
Platform’s Product Line
Cloud
Extension
Workstation
Cluster Extension
HPC Appl.
Integrations
Advanced
Analytics
EGO
GPU GPU GPU
Symphony HPC
ISF LSF
PCM
MPI
Products that make up Platform’s cloud solutions
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 14
What is Platform HPC? The easiest and most complete HPC cluster solution
• Feature-rich workload management
• Unified web portal for access anywhere
• Heterogeneous cluster management Complete Product
• Easy to use job submission portal • Customizable application templates
Integrated Application
Support
• Certified with server, storage & interconnect vendors
• Best customer support Certified
Platform
HPC
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 15
What is Platform LSF? The HPC Workload Management Standard
• Advanced, feature-rich workload scheduling
• Robust set of add-on features
• Integrated application support Complete
• Policy & resource-aware scheduling • Resource consolidation for max performance • Advanced self-management
Powerful
• Support for thousands of concurrent users & jobs
• Delivers a virtualized pool of shared resources to support multiple apps
• Flexible control to support multiple policy centers
Most Scalable
• Optimal utilization reduces infrastructure costs
• Improves user productivity for faster time to solution
• Robust operational capabilities improve administrative productivity
Best TCO
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 16 16
Dynamic Resource Management
• Separate applications from infrastructure by
creating an independent management platform . . .
What is Platform ISF?
. . . to achieve resource sharing, vendor
independence, and commodity computing
Application workloads
Private Cloud management platform
VM management Provisioning
Server Storage network
IaaS
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 17
Integrated Cluster with the Cloud (Platform LSF with VPN cluster management) 1
Platform’s Cloud Solutions for HPC
Cloud Bursting
Making it easy to extend to the cloud
Multi-Cluster to the Cloud (Platform LSF with Platform MultiCluster) 2
Dynamic Cluster Extension to the Cloud (Platform LSF with Platform ISF) 3
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 18
Use Case #1 Integrated Cluster with the Cloud
Internal Resources
Cloud provider VPC
connection
Workload Manager (Platform LSF)
The existing cluster nodes
are already too busy
Additional resources from
Amazon join automatically
an existing cluster
Platform LSF contacts cloud
provider to launch VMs
User
Jobs
End user submits more jobs
A policy
determines that
the jobs can go to
the cloud
1
2
3
4
5
6
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 19
Internal
Resources Cloud on-demand
instances
Workload Manager (Platform LSF)
MCO automatically
forwards jobs to the
new cluster based
upon poliicies
User
Jobs
Transparent for end users
LSF asks cloud provider
to create a new cluster of
VMs (possibly CCI)
MultiCluster orchestrator
Jobs that may go to
the cloud in RED
1
2
3
4
5
Use Case #2 Multi-Cluster to the Cloud
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 20
I need a cluster: 1 master and 3 computes nodes to test my new project
Dynamic Resource Manager
(Platform ISF)
Cloud provider gets
request from ISF to build
new test cluster
ISF determines that no internal
resources are available and by
policy QA/TST can go to cloud
User
Jobs
Grid is created and
then jobs get
submitted
Workload Manager
(Platform LSF)
Workload manager
requests resources from
dynamic resource manager
ISF passes master
location of new
nodes to LSF
1
2 3
4 5
7
6
Use Case #3 Dynamic Cluster Extension to the Cloud
ISF gets master
location from
cloud provider
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 21
Designed for Extensibility
• Customizable scheduling algorithm
• Open adapter architecture
• Flexible architectural options
On Demand Scalability
• Grow the grid when needed, shrink when not
• Contain capital costs, keep utilization high
Industry-Leading Support
• Large, worldwide development and support team
• Extensive partner ecosystem
• Nearly two decades of HPC experience
Platform’s Cloud for HPC Key Benefits
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 22
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 23
Problem #1: No HPC Infrastructure locally
Problem
• Smallest companies and consultants have no access to HPC
• Provisioning for HPC need is unwise / impractical
• Desktop is insufficient to service the workload
• Little or no IT/HPC expertise
Alternatives
• Very few
• Contract with Cycle Computing, Univa, others to build cloud infrastructure (expensive, long lead time)
Desired solution
• Self service, near instantaneous availability, security
• Provide pre-configured SaaS (open source apps) for anyone with an IaaS account
• Applications pre-installed, pre-configured, ready to execute
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 24
Platform OnDemand
Workstation
User
VPN OR MultiCluster
L S F D A T A
F I L E D A T A
Phase I: Life Sciences
Phase IB: GRE
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 25
Roadmap – On Demand
HPC Cluster
Linux Win Joined Indep L/S & Chem
CAE & IM
O/G GEO
GRE DCC FS EDA
n n n n
n n n n n
n n n n n n
n n n n n n n
n n n n n n n n n
n n n n n n n n n n
Phase I
Phase IB
Phase II
Phase III
Phase IIIB
Phase IV
CYQ3
2011
CYQ4
2011
CYQ1
2012
CYQ2
2011
Phase I IB Phase II Phase III IIIB Phase IV
Marketplace GA Independent Offering
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 26
EC2 Instance Sizes Vertical Master Host HPC Cluster
Life Sci. / Chem Standard Large HPC
CAE / IM Standard Large HPC
Oil & Gas / GEO Standard Large HPC
GRE Standard Small High-Mem 4 XL
DCC TBD TBD
Size Memory (GB) Cores HDD (GB) Price $(Lin / Win)
Standard-Small 1.7 1 160 0.085 / 0.12
Standard-Large 7.5 2 850 0.34 / 0.48
Standard-XL 15 4 1690 0.68 / 0.96
Micro 0.613 1 EBS only 0.02 / 0.03
High-Mem XL 17.1 2 420 0.50 / 0.62
High-Mem 2XL 34.2 4 850 1.00 / 1.24
High-Mem 4XL 68.4 8 1690 2.00 / 2.48
High CPU-med 1.7 2 350 0.17 / 0.29
High CPU-XL 7 8 1690 0.68 / 1.16
HPC 23 8 1690 1.60 / -
EC2 Instance Options
Copyright © 2010 Platform Computing Corporation. All Rights Reserved. 27
Thank You