Converge to Cloud.Converge to Cloud.
Locuz.comLocuz.com
HPC on AWS
Converge to Cloud.Converge to Cloud.
Faster Time to ResultsAccess computing
infrastructure in minutes
Lower Total CostPay-as-you-go pricing
Elastic and PowerfulEasily add or remove capacity
Globally AccessibleEasily collaborate with teams around the world
SecureA collection of tools toprotect data and privacy
ScalableAccess to effectively
limitless capacity
How is Cloud Helping Enterprise HPC?
Converge to Cloud.Converge to Cloud.
Cloud Computing in a Simulation-Driven World
Scalability and agility Secure global collaboration Enterprise data governance
Converge to Cloud.Converge to Cloud.
Scalability for Simulations
Design, engineering, analysis, visualization
› Simulation-driven design, discovery, optimization
Sample use-Cases
› Antenna and power simulations
› Genomics and proteomics
› Computational fluid dynamics
› Structural and finite element analysis
› Molecular modeling for drug discovery
› Oil and gas reservoir simulations
Cloud unlocks simulation at massive scale
Converge to Cloud.Converge to Cloud.
Actual Demand for Computing
Total servers deployed
Unused IT Resources
Time
Server
acquisition
Server
acquisition
Server
acquisition
The Old Way: Low Utilization, High Costs
› Typical server utilization rates are low due to need to deploy for peak needs
Converge to Cloud.Converge to Cloud.
Actual Demand for Computing
Managing with high utilization…
Time
The Hidden Cost of Managing HPC Utilization
??
Project
Delay
Project
Delay
› Higher
utilization rates
result in hidden
costs
› Longer queue
wait times, and
delayed
projects
Converge to Cloud.Converge to Cloud.
Conflicting goals
› Cluster users seek fastest possible time-to-results
› Simulations are not steady-state workloads
› IT support team seeks highest possible utilization
Result
› The job queue becomes the capacity buffer
› Job completion times are hard to predict
› Users are frustrated and run fewer jobs
?
HPC Queues Are Evil!
Converge to Cloud.Converge to Cloud.
The World as Seen by Central ITHigh utilization is viewed as a good thing
Converge to Cloud.Converge to Cloud.
The World as Seen by the HPC User
Schedule
impact!
Converge to Cloud.Converge to Cloud.In a secure Virtual Private Cloud
Automation and Auto Scaling allows easier
cluster management and monitoring
Converge to Cloud.Converge to Cloud.
High Performance, High Throughput Computing
HPC: High Performance Computing (Cluster Computing)
› Requires large numbers of compute cores arranged in a tight cluster, normally
more than are available in a single server
› Latency-sensitive: requires a high degree of communication between individual
tasks running on each compute core
HTC: High Throughput Computing (Grid Computing)
› Like HPC, also requires large numbers of compute cores, however there is minimal
need for communication between the tasks
Cloud supports both HPC Cluster and HTC Grid Use-Cases
› Traditional HPC cluster applications can scale well on EC2, with the added benefit
of higher scale for parallelizing HPC jobs
› HTC applications run extremely well on AWS
Converge to Cloud.Converge to Cloud.
Cluster HPC and Grid HTC on the Cloud
Cluster HPC
› Tightly coupled, latency
sensitive applications
› Use larger EC2
compute instances,
placement groups,
Enhanced Networking
Grid HTC
› Loosely coupled,
pleasingly parallel
› Use a variety of EC2
instances, multiple
AZs, Spot, Auto
Scaling, SQS
HPC + HTC
› Use a grid strategy on the cloud to
run a group of parallel, individually
clustered HPC jobs
Converge to Cloud.Converge to Cloud.
Locuz.comLocuz.com
Industry Examples
Converge to Cloud.Converge to Cloud.
HGST applications for engineering:
› Molecular dynamics, CAD, CFD, EDA
› Collaboration tools for engineering
› Big data for manufacturing yield analysis
Partner:
Example in Electronics Manufacturing
Molecular Dynamics Simulation
at HGST:
› Millions of parallel parameter
sweeps, running months of
simulations in just hours.
› Over 70,000 Intel cores running
at peak, using EC2 Spot instances
Converge to Cloud.Converge to Cloud.
Example in Life Sciences
Baylor CHARGE project:
› Genomics analysis on 14,000
participants
› 24 terabases of sequencer
content each month
› 1PB of raw data storage
› 21,000 AWS compute cores
at peak
› Initial analysis completed in
10 days
Converge to Cloud.Converge to Cloud.
Example in Financial Regulation
Converge to Cloud.Converge to Cloud.
Large scale of animation rendering on AWS:
• Cloud Rendering at Walt Disney Animation Studios (available on SlideShare)
• Automated environment leveraging Spot Fleet
• Launched 40K cores in 20 min
at less than $0.02 per core-hour
Example in Animation Rendering
Converge to Cloud.Converge to Cloud.
Shared File Storage
Cloud-Based, Auto-Scaling
Render Farm on EC2
License Managers and
Cluster Head Nodes
3D Graphics Virtual Workstation
Remote Graphics
AWS Direct Connect
On-Premises IT
Resources
Client Devices
- No local data -
Storage Cache
Amazon S3
Rendering Farm Architecture
Converge to Cloud.Converge to Cloud.
Altair HyperWorks on AWS
Converge to Cloud.Converge to Cloud.
ANSYS Enterprise Cloud on AWS
Converge to Cloud.Converge to Cloud.
Locuz Competency
› HPC Lifecycle Management: unique methodology to provision
a complete HPC environment on AWS for the entire lifecycle of a
HPC infra or Application
› Automation : Using AWS automation we fast provision a hpc
cluster (CPU / GPU), performance Storage, high speed network etc.
along with hpc middle ware tools
› IP Led Management : Ganana Job submission Portal further
reduces the end users learning curve to run hpc jobs on cloud
› Containerization: Using container technology for faster
application provision.
Pre-processing / Meshing
Simulation
Post Processing
Visualization (2D/3D)
Converge to Cloud.Converge to Cloud.
Service Offerings
HPC Assessment and consulting services
› HPC capacity planning.
› HPC requirement analysis with detail roadmap of migration to AWS
› HPC consulting services for Hybrid and Cloud only models
HPC Deployment and Managed services
› HPC Infra Deployment & On boarding
Services
› HPC Application Workflow optimization
› 24/7 Remote Management services with
uptime commitments at middleware level
through NOC.
› GUI job submission portal.
› Check pointing at scheduler level.
› Hadoop / DASK Analytic Cluster services
23
HPC Application services
› Application benchmarking.
› Application porting to multi OS & Cloud platforms (Linux, Windows, GPUs etc.) using Docker Container
› Application optimization for performance.
› Application migration to accelerator technologies – GPU
› Deployment and optimization services of CUDA enabled applications on certified platforms)
Converge to Cloud.Converge to Cloud.
On-Demand
› Pay for compute capacity by the hour with no long-term commitments
› For spiky workloads, or to define needs
AWS Consumption Models
Reserved
› Make a low, one-time payment and receive a significant discount on the hourly charge
› For committed utilization
Spot
› Bid for unused capacity, charged at a Spot Price which fluctuates based on supply and demand
For time-insensitive or transient workloads
Converge to Cloud.Converge to Cloud.
On
Reserved Instances
100%
On-Demand
Time
Spot
Optimize Utilization on AWS with RI, On-Demand, Spot
Scale up Scale down
Converge to Cloud.Converge to Cloud.
With Spot the Rules are Simple
Spot is a market in which the price of compute changes based on supply and
demand
You’ll never pay more than your bid. When the market exceeds your bid you get 2 minutes to wrap up your
work
Converge to Cloud.Converge to Cloud.
Best Practices for Using Spot
Fault toleranceStateless Multi-AZ Loosely coupledInstance Flexibility
Converge to Cloud.Converge to Cloud.
Locuz.comLocuz.com
Thank You!