Scaling Up Engineering Analysis using Windows HPC Server 20008
High Productivity for HPC
HPC for ANSYS
Benchmarks & Case Studies
Discussion
Agenda
X64 Server
Why Clusters?
• Clusters enable bigger simulations (more accuracy)– Typical FEA models require 4-8 GB RAM
• Today’s more demanding models might consume 100 GB– Typical CFD models require 2-10 GB RAM
• Today’s more demanding models might consume 50-100 GB
• Clusters deliver turnaround time– Clusters can yield nearly linear scaling
• A typical simulation might run for 8 hours on a single CPU →Becomes (ideal case) 1 hour on an eight-core cluster
• Clusters can scale up with demand– More simulations (more design options)
• Optimization studies might require 50-100 simulations
Simplified parallel development
Integrated desktop and HPC environment
Combined infrastructure
Unified development environment
High Productivity Computing
HPC and IT data centers merge, streamlined cluster management
Users with broad access to multiple cores and servers
“Provide the platform, tools and broad ecosystem to reduce the complexity of HPC by making parallelism more accessible to address future computational needs.”
Microsoft’s Vision for HPC
Reduced Complexity Mainstream HPC Broad Ecosystem
Ease deployment forlarger scale clusters
Simplify management forclusters of all scale
Integrate with existing infrastructure
Address needs of traditional supercomputing
Address emerging cross-industry
computation trends
Enable non-technical users to harness the power of HPC
Increase number of parallel applications and codes
Offer choice of parallel development tools,
languages and libraries
Drive larger universe ofend-users, developers,
and system administrators
Windows HPC Server 2008
www.microsoft.com/hpc
• Complete, integrated platform for HPC Clustering
• Built on top Windows Server 2008 64-bit Operating System
• Addresses the needs of traditional and emerging HPC
Windows Server 2008 HPC Edition
• Secure, Reliable, Tested
• Support for high performance hardware (x64, high-speed interconnects)
Microsoft HPC Pack 2008
• Job Scheduler• Resource Manager • Cluster Management• Message Passing Interface
Microsoft Windows HPC Server 2008
• Integrated Solution out-of-the-box
• Leverages investment in Windows administration and tools
• Makes cluster operation easy and secure as a single system
Windows HPC Server 2008
Spring 2008, NCSA, #239472 cores, 68.5 TF, 77.7%
Fall 2007, Microsoft, #1162048 cores, 11.8 TF, 77.1%
Spring 2007, Microsoft, #1062048 cores, 9 TF, 58.8%
Spring 2006, NCSA, #130896 cores, 4.1 TF
Spring 2008, Umea, #405376 cores, 46 TF, 85.5%
30% efficiencyimprovement
Windows HPC Server 2008
Windows Compute Cluster 2003
Spring 2008, Aachen, #1002096 cores, 18.8 TF, 76.5%
November 2008 Top500
Group compute nodes based on hardware, software and custom attributes; Act on groupings.
Pivoting enables correlating nodes and jobs together
Track long running operations and access operation history
Receive alerts for failures
List or Heat Map view cluster at a glance
Simulation Driven Product Development
Engineering simulation software for product development
Innovative & higher-quality products
Dramatic time-to-market improvement
Minimize development, warranty & liability costs
CAE Simulation Process
CONCEIVEConcept designComponent modeling Parameterization and description of variables and objectives
COMPUTEStructural analysisFluid dynamicsFluid-structure interactionElectromagnetics
UNDERSTANDVisualizationPost-processing
OPTIMIZE Parameter variationsMultiple design options
Interactive Desktop or Client Process
Computationally Intensive Process
(HPC Server 2008)
High Performance Computing Drivers
• Bigger Simulations– More geometric detail
– More complex physics
• More Simulations – On time insight
– Multiple design options
– Automated optimization
Memory (lots)
Capacity (more)
Speed (more)
Data Management (lots)
Microsoft – ANSYS Joint Value
To increase compute capacity so that engineers, product designers, and performance evaluators can generate high-fidelity simulation results faster and more economically
To help IT groups leverage in-house skills, existing technologies, and familiar admin/management interfaces by integrating HPC with their Windows infrastructures
Customer Success: Spraying Systems Co.
• Deployed Windows HPC with FLUENT complex simulation of industrial spray systems.
Achieved shorter run times for simulations which had been tying up engineering workstations• Achieved 12X increase in computation speed• Sample run time reduced from 192 hours to 16 hours• Enabled more detailed, accurate simulations• Freed up workstations for setup
Leveraged existing Windows infrastructure and expertise
“Using a dual or quad core workstation is fine for smaller simulations, but for complex, extensive simulations event multi-core workstations couldn’t complete the computations in a reasonable timeframe.”
- Rudolf Schick, Vice President, Spray Analysis & Research Services
• Deployed Windows HPC with ANSYS CFX and FLUENT for upstream and downstream applications. Engineers running 5-10 simultaneous simulations using 8-30 core per job.– Improved productivity time for research team– Simpler, more centralized support of clusters
“With Windows HPC, setup time has decreased from several hours – or even days for some clusters – to just a few minutes, regardless of cluster size.” IT Manager, Petrobras CENPES
Customer Success: Petrobras
“Ferrari is always looking for the most advanced technological solutions and, of course, the same applies for software and engineering. To achieve industry leading power-to-weight ratios, reduction in gear change times, and revolutionary aerodynamics, we can rely on Windows HPC Server 2008. It provides a fast, familiar, high performance computing platform for our users, engineers and administrators.”
-- Antonio Calabrese, Responsabile Sistemi Informativi (Head of Information Systems), Ferrari
“It is important that our IT environment is easy to use and support. Windows HPC is improving our performance and manageability.”
-- Dr. J.S. Hurley, Senior Manager, Head Distributed Computing, Networked Systems Technology, The Boeing Company
Customers
“Our goal is to broaden HPC availability to a wider audience than just power users. We believe that Windows HPC will make HPC accessible to more people, including engineers, scientists, financial analysts, and others, which will help us design and test products faster and reduce costs.”
-- Kevin Wilson, HPC Architect, Procter & Gamble
Windows HPC Server 2008 Support
• ANSYS R12 supports Windows HPC Server 2008
– ANSYS Mechanical, ANSYS FLUENT, ANSYS CFX
• Builds on success with ANSYS R11 and Windows CCS
– Full support for job scheduler and MS-MPI
– Improved control: Map processes to nodes, sockets, and cores
– More performance: Greatly improved parallel scaling based on network direct MPI
Job Scheduler Support – ANSYS Workbench
• Allocates the necessary resources to the simulations
• Tracks the processors associated with the job
• Deallocates the resources when
the simulation finishes
Performance Gain – ANSYS Mechanical
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
0 2 4 6 8 10 12
Elap
sed
Tim
e Sp
eed
-BM
D-6
Number of Cores
Comparision CCS1 w R11 vs. CCS2 w R12
CSS2 w R12
CCS1 w R11
ANSYS R11 on WCCS vs. ANSYS R12 on Windows HPC Server 2008
ANSYS FLUENT Public Benchmarks
Conclusions:For ANSYS FLUENT 12.0, Windows HPC Server 2008 is delivering the same or faster performance than Linux on the same hardware (HP BL2x220 3GHz with 16GB memory per node and InfiniBand interconnect).
Reference: Dataset is External Flow over a Truck Body, 14M cells, public benchmark. ANSYS FLUENT 12.0 data posted at http://www.fluent.com/software/fluent/fl6bench/new.htmWindows HPC data run with pre-release ANSYS FLUENT 12.0.16. Results run by HP and submitted/approved by ANSYS.
0
50
100
150
200
250
2 4 8 16 32 64
Flue
nt R
atin
g (H
ighe
r is
bet
ter)
Number of CPU Cores
Windows
Linux
Performance Gain - FLUENT
• Combined OS and software improvements yield 24% improvement at 32-way parallel
Parallel Scaling Improvement on Windows HPC Server 2008
0
200
400
600
800
0 8 16 24 32
Number of Core
Ratin
g
FLUENT 12.0.5 -HPC Server 2008
FLUENT 6.3 -Window s CCS2003
Exterior flow around a passenger sedan, 3.6M cells turbulence , pressure-based coupled solver
Performance Gain – Windows HPC Server 08
• FLUENT 6.3 shows 20% improvement at 32 core on Windows HPC Server 2008
• Conclusion:– HPC Server 2008
yields significant gains!
FLUENT 6.3 - Operating System Comparison
0100200300400500600700800
0 4 8 12 16 20 24 28 32
Number of Core
Rat
ing WCCS 2003
HPC Server 2008
Exterior flow around a passenger sedan, 3.6M cells turbulence , pressure-based coupled solver
Performance Scale-Out
• FLUENT 12 showing near-ideal scaling in tests to 64-core (GigE network)
• FLUENT 12 and FLUENT 6.3 show near-ideal scaling in tests to 128 core (IB network)
• Conclusion:– GigE scale-out improved with
FLUENT 12
– IB scale-out now excellent to 128 core with FLUENT 6.3 or FLUENT 12• And 77% of ideal at 256 core!
Performance Tuning in FLUENT 12 - Windows HPC Server 2008 (GigE)
0
1000
2000
3000
4000
0 8 16 24 32 40 48 56 64
Number of Core
Ratin
g
FLUENT 6.3FLUENT 12.0.5IDEAL
Performance Tuning in FLUENT 12 - Windows HPC Server 2008 (IB)
0
5000
10000
15000
0 64 128 192 256
Number of Core
Ratin
g FLUENT 6.3.33FLUENT 12.0.5IDEAL
Exterior flow around a passenger sedan, 3.6M cells
Windows HPC Server 2008 vs. Linux
• Direct comparison of performance on same hardware
• Windows solution is within 6% of Linux data on this cluster (at 64 core)
• Conclusion:– Windows vs. Linux
performance is very comparable Exterior flow around a passenger sedan, 3.6M cells
turbulence , pressure-based coupled solver
Data courtesy of Dell, Inc. and X-ISS, Inc.
Windows HPC Server 2008 vs. LinuxDell PowerEdge SC 1435 (Dual-Core Opteron, IB)
0
500
1000
1500
2000
2500
0 16 32 48 64
Number of cores
Ratin
g
RHEL AS 4
Windows HPCServer 2008
Windows HPC Server 2008 vs. Linux Dell PowerEdge SC 1435 (Dual-Core Opteron, IB)
0
10
20
30
40
50
60
0 16 32 48 64
Number of cores
Spee
dup Windows HPC Server
2008RHEL AS 4
Sizing a Cluster – ANSYS Mechanical
• Typical cluster sizes for ANSYS will use 4-8 core per simulation at R11 (8-16+ for R12)– Typically smaller numerical workloads tend to diminished scaling
beyond this point
• Total RAM on the cluster will determine maximum model size– Typically, double the RAM on the head node (“core 0”)
• I/O configuration can be a limiting factor– Significant I/O during the computations, on the compute nodes
How many cores?
Number of simultaneous CFD simulations on the
cluster
CFD model size (number of cells)
Cluster Size
(Number of CPUs or cores)
1 Up to 2-3M 4
2 Up to 2-3M 8
1 Up to 4-5M 8
1 Up to 8-10M 16
2 Up to 4-5M 16
1 Up to 16-20M 32
2 Up to 8-10M 32
4 Up to 4-5M 32
1 Up to 30-40M 64
1 Up to 70-100M 128
How much RAM?
• Recommended: 2GB/core (e.g., 8GB per dual processor dual-core node)
• Total memory requirement increases linearly with the CFD model size:
CFD Model Size RAM Requirement
Up to 2M cells 2 GB
Up to 5M cells 5 GB
Up to 50M cells 50GB (and continued linear relation as the problem size increases)
HP Confidential
Processor: Four DL380 G5 Woodcrest Xeon-based server nodes, each dual processor, dual core (2p/4c*)
Four cores per compute node Node 1 doubles as the head node
Total Memory for Cluster: up to 80GB RAM 8GB/core or 32GB total on Head node 2 or 4GB/core (8 or 16GB/node) on each 3 remaining compute nodes
Storage: Two 72GB SAS drives striped RAID0 on 3 compute nodes plus 5x72GB SAS RAID0 disk array on head node
Interconnect: GigE cluster switch; 100BT management switch
Operating Environment: 64-bit Windows HPC Server 2008
Workloads: 56 - 80GB RAM configurations will handle ANSYS “megamodels” of 50M
DOF
DL380 G5 Woodcrest
*2 processors, 4 cores total per node=dual core
HP 16-core Cluster for ANSYS
HP Confidential
Processor Options:Up to 8 Xeon (BL460c) or Opteron (BL465c) compute nodes, each dual processor, dual core (2p/4c*).
4 cores per compute node 72GB SAS drive
Option to configure one node for storage Node 1 can be used as the head nodeand/or pre/post processing node. Two 72GB SAS drives suitable for head node.
HP 32-core BladeSystem for CFD
Total Memory for the Cluster: 64 – 88GB RAM2GB/core (8GB/node) on compute nodes and head node. 8GB/core (32GB on node 1) if using head node for Pre/post processing
Interconnect: Integrated Gigabit Ethernet or InfiniBand DDR, and management network
Storage: Optional SB40c storage bladeExtended direct attached storage on the head nodeUp to 6 SFF SAS drives
64-bit Windows HPC Server 2008
Ideally suited for FLUENT or CFX models up to 50M cells. Or, run 3-4 simultaneous fluids models on the scale of 10M cells
*2 processors, 4 cores total per node=dual core
Microsoft HPC in the Future
Broad Reaching HPC
Support Traditional & Emerging HPC
Larger Cluster support & Top500 Range
Greater Accessibility for Windows-based Users
Broader Developer support with tools and SOA
Improved Management and Deployment
Parallel Extensions
Personal Super Computing
Microsoft Entry into HPC Personal And Workgroup
Technical Computing End User Applications
available for Windows Parallel and HPC Development
Tools Ease of Management and
Deployment
SeamlessParallelism
Futures
Parallel Computing Everywhere Ultra-Scale/Cloud Computing Transparent User Access Implicit parallelism for .NET
developers Dynamic and Virtualized
workloads Mainstream Management of
HPC and IT Infrastructure
• Microsoft HPC Web site – Evaluate Today!– http://www.microsoft.com/hpc
• Windows HPC Community site– http://www.windowshpc.net
• Windows HPC TechCenter– http://technet.microsoft.com/en-us/hpc/default.aspx
• HPC on MSDN– http://code.msdn.microsoft.com/hpc
• Windows Server Compare website– http://www.microsoft.com/windowsserver/compare/default.mspx
Additional Information
Questions?
• Let us know how Microsoft and your local ANSYS representative can help you plan and provide the computing resources you need for simulation.
Taking HPC Mainstream
© 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation
as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS,IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.