Aggregate. Scale. Simplify. Save. Confidential and Proprietary
CUSTOMER USE CASES
7/26/2010 20
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Target Environments and Applications
7/26/2010 21
• Users seeking to simplify cluster complexities
• Applications that use large memory footprint (even with one processor)
• Applications that need multiple processors and shared memory
Manufacturing
CSM (Computational Structural Mechanics)
ABAQUS/ExplicitABAQUS/StandardANSYS MechanicalLSTC LS-DYNAALTAIR RadiossNASTRAN
CFD (Computational Fluid
Dynamics)
FLUENTANSYS CFXSTAR-CDAVL FIRETgrid
OtherinTrace OpenRT
Life Sciences
GaussianVASPAMBERSchrödinger JaguarSchrödinger GlideNAMDDOCKGAMESSGOLDmpiBLASTGROMACSMOLPROOpenEye FREDOpenEye OMEGASCM ADFHMMER
Energy
Schlumberger ECLIPSEParadigm GeoDepth3DGEO 3DPSDMNorsar 3D
OthersThe MathWorks MATLABROctaveWolfram MATHEMATICAISC STAR-P
EDA
MentorCadenceSynopsys
Finance
WombatKX
Weather Forecasting
MM5WRF
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
SIMPLICITY
7/26/2010 22
3.1
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Customer Use Cases
• Customer: School of Engineering – University of Alabama
• Current platform: None. Just getting into HPC
• Problems:– Compute requirements were growing as number of users/students was growing
– No in-house skills to run x86 InfiniBand cluster
– Limited operational budget to hire additional sys-admin resources
• Applications: Commercial application (mostly Fluent and MATLAB)
• Solution:– 4 full blade chassis, each aggregated as a single system with 128 cores and 384 GB RAM and 5 TB of
internal storage
– Total: 64 physical nodes, 512 cores, 20TB storage - running as a cluster of 4 ‘Fat-Nodes’
• Benefits:– Low OPEX:
• No additional IT required for day-to-day operations
• The need to manage only 4 ‘Fat-Nodes’
• Internal storage is embedded in each ‘Fat-Node’
– Simplicity: InfiniBand performance without the complexityof managing such a solution
ENGINEERING FACULTY
LARGE SCALE DEPLOYMENTWITHOUT THE COMPLEXITY
7/26/2010 23
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Customer Use Cases
• Customer: Mid-size Engineering Services Company
• Current platform: Multiple 2-socket workstations
• Problems:– Existing models (Abaqus) grow fast and can’t fit the engineers workstation
– Interested in running apps in batch at night
– No in-house skills to run x86 InfiniBand cluster (although the application runs nicely on InfiniBand cluster) . Can’t afford RISC systems
• Solution:– 4 Intel dual-processor Xeon systems to provide 128GB RAM, 8 sockets (16 cores) single virtual system
running Linux with vSMP Foundation
• Benefits:– Performance: Solution significantly faster than existing workstations. Performance is comparable to
cluster performance (using vendor benchmarks).
– Low OPEX: No IT required for day-to-day operation
– Versatility: Batch mode at night. Daytime jobs are executed on the systemwhile using the workstation for display only. Multi-user environment withperfect scaling – and sharing without performance degradation.
– Investment protection: Expected to expand the system by adding additional4 nodes (to a total of 256GB RAM, 32 cores)
ENGINEERING SERVICES COMPANY
7/26/2010 24
INNOVATION WITHOUT
COMPLEXITY
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
SIMPLIFYING INTER-PROCESS
COMMUNICATION
Customer Use Cases
• Customer: Hedge Fund
• Current platform: Multiple 4-Socket Servers
• Problems:– A single 4-socket server did not provide enough performance required for customer business targets
– Multiple 4-socket servers required complex decomposition and introduced challenges in transferring data between processes in a short and deterministic time (low latency and small jitters)
• Ethernet based solution could not provide this / IB solution is too complex to manage and program for
– Co-location at exchanges for a solution comprised of multiple systems is complicate
• Applications: KX, WOMBAT, home-grown code
• Solution:– 16 Intel dual-processor Xeon systems to provide 0.5TB RAM, 32 sockets (128 cores) single virtual
system running Linux with vSMP Foundation
• Benefits:– Reduced latency and latency variance
– Simpler solution: Deploy and management of a single system
– Better utilization: Single system reduces resources fragmentation
– Simpler programming model: No need for specific InfiniBandprogramming
FINANCIAL SERVICES
7/26/2010 25
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
FLEXIBILITY
7/26/2010 26
3.2
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
COST EFFECTIVE FLEXIBLE SOLUTION WITH HIGH
UTILIZATION
Customer Use Cases
• Customer: Hosted HPC resource provider
• Current platform: Clusters and SMP machines
• Problems:– Need to run MPI as well as OpenMP (shared memory) codes
– Large shared memory jobs require dedicated proprietary hardware
– Low utilization on shared memory systems
• Applications: A variety of commercial codes
• Solution:– Original: 4 systems, total of 8 sockets (32 cores) and 128GB RAM
– Solution was extended to 16 nodes
• Benefits:– Utilization: Rely on standard commodity hardware
– Flexibility: Using same system for both shared memoryand cluster benchmarks, resulting in high utilization
HOSTED HPC RESOURCE PROVIDER
7/26/2010 27
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
ELASTIC VM SOLUTION AIMED FOR DATA
INTENSIVE COMPUTING
Customer Use Cases
• Customer: San Diego Supercomputer Center (SDSC)
• Current platform: 8-socket AMD systems
• Problems:– Require an infrastructure for data intensive computing
– Need large memory system (TBs in size), depending on job need
– Require the ability to access quickly large amounts of storage
• Applications: A variety of data intensive codes (Astronomy, Genomics, Data Mining, etc..)
• Solution:– Initial Deployment: 4 ‘Super Nodes’, each with 768GB RAM, 128 cores, 10TB internal storage
– Complete Deployment (2011): 1,024 servers with vSMP Foundation for Cloud. Could be aggregated up to 32 ‘Super Nodes’ each nodes is 32 servers, resulting in 2TB RAM and 8TB of SSDs
– On demand allocation using web-request and fast (<10 minutes) provisioning
• Benefits:– Flexibility: Provision multiple ’Super Nodes’ of various
sizes according to need
– Performance: Extremely fast hierarchical memory solution: RAM Aggregated RAM Aggregated SSDs
SUPER COMPUTER CENTER
7/26/2010 28
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
CAPABILITY
7/26/2010 29
3.3
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Customer Use Cases
• Customer: Global Energy Company
• Current platform: x86 grid
• Problems:– Using in-house single-threaded simulation tools in throughput mode. Each simulation
memory footprint has grown over the years and sometimes (10%) exceeds 32GB.
– Application runs on x86 only
– Used to reschedule failed runs on large-memory systems
• Solution:– 6 Intel dual-processor Xeon systems to provide 192GB RAM, 12 sockets (48 cores) single virtual
system running Linux with vSMP Foundation
• Benefits:– Versatility: Both large and small workloads used concurrently on the same system
– Utilization: Higher utilization compared to grid due to lower infrastructure fragmentation
– Investment protection: Solution expanded by 100% since initial installation
GLOBAL ENERGY COMPANY
7/26/2010 30
SINGLE INFRASTRUCTURE FOR HORIZONTAL ANDVERTICAL APPLICATION SCALING – PLUG & PLAY
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
SCALEUPAT SCALEOUT
PRICING
Customer Use Cases
• Customer: Formula1 team
• Current platform: Large-memory Itanium-based system
• Problems:– Need to generate large mesh as part of pre-processing of whole-car
simulation (FLUENT TGrid)
– Mesh requirements are ~200GB in size
– Expect to grow significantly within 12 months after initial deployment
– Would like to standardize on x86 architecture due to lower costs and open standards
• Solution:– 12 Intel dual-processor Xeon systems to provide 384GB RAM single virtual system running Linux with
vSMP Foundation
• Benefits:– Better performance: Solution evaluated and found to be faster than
alternative systems (x86 and non-x86)
– Cost: Significant savings compared to alternative system
– Versatility: Also being used to run FLUENT (MPI) as part of large cluster
– Investment protection: Solution can grow
FORMULA1 TEAM
7/26/2010 31
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
SIMPLE AND FLEXIBLE COST
EFFECTIVE SOLUTION
Customer Use Cases
• Customer: Weather forecasting service provider
• Current platform: SGI Altix with 32 cores
• Problems:– Need to run MPI as well as OpenMP codes
– System needs to be deployed remotely, and hence needs to be simple to manage
– Data processing flow is complex and requires transferring large amounts of data between steps
• Applications: MM5, WRF, MAWSIP, Home-grown code for data transformation
• Solution:– 4 Intel Nehalem dual socket blades, total of 8 sockets (32 cores) and 192GB RAM
– Internal storage
– Solution was extended to 8 blades, total of 16 sockets (64 cores) and 384GB RAM
• Benefits:– Performance: 2.5 X better performance on same # of cores (32)
– Simpler solution: Significantly reduced capital expense, allowedthe customer to have a higher # of cores
– Simplicity: Simple to manage by domain experts (weather forecast scientists)
– Dataflow remains within the system, leveraging internal storage
WEATHER FORECASTING SERVICE PROVIDER
7/26/2010 32
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
Customer Use Cases
• Customer: Large European semiconductor manufacturer
• Current platform: Proprietary large shared memory system + compute grid
• Problems:– Dedicated and proprietary systems were expensive
– Utilization and efficiency of the existing large memory system was low for throughput jobs
– The mix of throughput and large memory jobs required maintaining 2 separate environments
• Solution:– 8 Intel dual-socket (quad-core) Xeon systems to provide 300GB RAM single virtual system running
Linux with vSMP Foundation
• Benefits:– The ability to run large memory jobs when required
– Flexibility: Switch to between large memory and throughput jobs in an efficient way on the fly
– Consistency: Underlying hardware is aligned with standard hardware used for the rest of the compute grid
– Better utilization: Having a single system reduces resources fragmentation
– Performance: Leverage most recent Intel CPUs for large-memory jobs
SEMICONDUCTOR MANUFACTURER
7/26/2010 33
FLEXIBLE SYSTEM FOR THROUGHPUT AND LARGE MEMORY JOBS
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
LARGE MEMORY FOR MULTI-THREADED PROGRAMMING
Customer Use Cases
• Customer: Medical Research Institute
• Current platform: HP Superdome System
• Problems:– Need to perform high performance image processing on very large MRI scans
– Scanned data for a single run is currently over 200GB. Memory requirements are expected to grow significantly with the introduction of full body scan with more sensors
– Would like to use and commercial tools for faster development
– Would like to standardize on x86 architecture due to lower costs and open standards
• Applications: Siemens CT processing, MATLAB, BLAS, Home-grown code, …
• Solution:– 16 Intel dual-processor Xeon systems to provide 1TB RAM, 32 sockets (128 cores) single virtual
system running Linux with vSMP Foundation
• Benefits:– Better performance: Solution evaluated and found to be faster
than any other alternative system
– Cost: Significant savings compared to alternative system (orderof magnitude)
– Versatility: Also being used for MPI jobs as part of large cluster
MEDICAL RESEARCH INSTITUTE
7/26/2010 34
Aggregate. Scale. Simplify. Save. Confidential and Proprietary
SHARED MEMORY MULTI-THREADED PROGRAMMING
Customer Use Cases
• Customer: RWTH Aachen - Polytechnic University
• Current platform: 4 socket x86 server
• Problems:– Need to run auto-parallelized code (using OpenMP) with high core count
– Other solution found not to work:• Cannot afford proprietary SMP
• Cluster-OpenMP proved not to scale
– Has to use OpenMP for faster development (machine generated Fortran code)
• Applications: Home-grown codes (SHEMAT Suite, FIRE, …)
• Solution:– 13 Intel dual-processor Xeon systems to provide 26 sockets (104 cores)
– Single virtual system running Linux with vSMP Foundation
• Benefits:– Performance: Solution scales at over 95% efficiency
to 104 cores with OpenMP codes
– Cost: Significant savings compared to alternative solutions
– System scale: Largest x86 system available
ACADEMIC RESEARCH
7/26/2010 35