Post on 30-Dec-2015
description
transcript
Palacios and Kitten: New High Performance Operating Systems For
Scalable Virtualized and Native Supercomputing
John R. Lange and Kevin Pedretti
Trammell Hudson, Peter Dinda, Zheng Cui, Lei Xia, Patrick Bridges, Andy Gocke, Steven Jaconette, Mike Levenhagen and Ron Brightwell
Northwestern UniversitySandia National LabsUniversity of New Mexico
2
Summary
• Palacios – First VMM for scalable HPC
– Open Source and available
• Kitten – First open source Lightweight Kernel for High Performance
Computing (HPC)
– Open Source and available
• Proved HPC virtualization is effective at scale– Performance within 5% of native
– Largest scale study of virtualization
3
What is a virtual machine?
• Run an OS as an application– Run multiple OS environments on a single machine– Start, stop, pause– Can easily move entire OS environments
Hardware
OS
Application
Hardware
Guest OS
Application
Host OS/VMM
Guest OS
Application
Guest OS
Application
Guest
VMM
Page TablesCPU stateHardware
Hardware
Emulate
Page TablesCPU stateHardware
4
What are VMMs currently used for?
• Server Consolidation• Fault tolerance• Legacy application support• Debugging• Isolation• Virtual appliances• Failover and disaster recovery
• Market size– 2007: $5.5 billion– 2011: $11.7 billion
$16.70 Billion
$7.58 Billion
6
Virtualization in HPC
• Fault tolerance– RedStorm MTBI target: 50 hours– RedStorm Min TTR: 30 minutes – 1 hour
• Broader usage– Allow applications to select best OS
• Only if it doesn’t degrade performance…– Tightly coupled parallel applications– Very large scale
A.B. Nagarajan, F. Mueller, C. Engelmann, and S.L. ScottProactive Fault Tolerance for HPC with Xen VirtualizationICS 2007
7
Palacios VMM• OS-independent embeddable virtual machine monitor• Developed at Northwestern and University of New Mexico• Open source and freely available
– Downloaded over 1000 times as of July 2009• Users:
– Kitten: Lightweight supercomputing OS from Sandia National Labs– MINIX 3– Modified Linux versions
• Successfully used on supercomputers, clusters (Infiniband and Ethernet), and servers
http://www.v3vee.org/palacios
8
Palacios as an HPC VMM
• Minimalist interface– Suitable for an LWK
• Compile and runtime configurability– Create a VMM tailored to specific environments
• Low noise
• Contiguous memory pre-allocation
• Passthrough resources and resource partitioning
1991 – Sandia/UNM OS (SUNMOS), nCube-2
1991 – Linux 0.02
1993 – SUNMOS ported to Intel Paragon (1800 nodes)
1993 – SUNMOS experience used to design Puma
First implementation of Portals communication architecture
1994 – Linux 1.0
1995 – Puma ported to ASCI Red (4700 nodes)
Renamed Cougar, productized by Intel
1997 – Stripped down Linux used on Cplant (2000 nodes)
Difficult to port Puma to COTS Alpha server
Included Portals API
2002 – Cougar ported to ASC Red Storm (13000 nodes)
Renamed Catamount, productized by Cray
Host and NIC-based Portals implementations
2004 – IBM develops LWK (CNK) for BG/L/P (106000 nodes)
2005 – IBM & ETI develop LWK (C64) for Cyclops64 (160 cores/die)
Lightweight Kernel Timeline
10
Kitten: An Open Source LWK
• Better match for user expectations– Provides mostly Linux-compatible user environment
• Including threading
– Supports unmodified compiler toolchains and ELF executables
• Better match vendor expectations– Modern code-base with familiar Linux-like organization
• Drop-in compatible with Linux
– Infiniband support
• End-goal is deployment on future capability system
http://software.sandia.gov/trac/kitten
11
Complexity
• Scalable HPC performance requires minimal overhead
Component Lines of codeKitten ~33,000
Palacios ~28,000
Total ~61,000
Xen: 580k lines (50k – 80k core)
KVM: 50k-60k lines + Kernel dependencies (??)+ User level devices (180k)
12
HPC Performance Evaluation
• Virtualization is very useful for HPC, but…Only if it doesn’t hurt performance
• Virtualized RedStorm with Palacios– Evaluated with Sandia’s system evaluation
benchmarks
17th fastest supercomputer
Cray XT338208 cores~3500 sq ft
2.5 MegaWatts$90 million
14
CatamountCompute Node Linux
Comparison of Operating Systems
HPCCG: conjugant gradient solver
Shadow Paging
15
Comparison of Operating Systems
CatamountCompute Node Linux
CTH: multi-material, large deformation, strong shockwave simulation
16
Large Scale Study
• Evaluation on full RedStorm system– 12 hours of dedicated system time on full machine– Largest virtualization performance scaling study to date
• Measured performance at exponentially increasing scales– Up to 4096 nodes
• Publicity– New York Times– Slashdot– HPCWire– Communications of the ACM– PC World
17
Scalability at Large Scale(Catamount)
CTH: multi-material, large deformation, strong shockwave simulation
Within 3%
Scalable
18
Commodity Systems• Kitten and Palacios fully support commodity systems
– Infiniband clusters
– Ethernet servers
– Generic PC hardware
• Palacios embeddable in many OSes– Kitten
– MINIX 3
– Linux
– GeekOS
19
Infiniband on Commodity Linux
2 node Infiniband Ping Pong bandwidth measurement
(Linux guest on IB cluster)
20
Summary
• Virtualization can scale– Near native performance for optimized VMM/guest (within 5%)
• VMM needs to know about guest internals– Should modify behavior for each guest environment– Example: Paging method to use depends on guest
• Black Box inference is not desirable in HPC environment– Unacceptable performance overhead– Convergence time– Mistakes have large consequences
• Need guest cooperation– Guest and VMM relationship should be symbiotic– Paper forthcoming (4096 scaling results and techniques)
21
Future Work
• Continue exploring virtualization in HPC– NU, UNM and SNL collaboration– Granted 5 million hours on Jaguar
• Current fastest supercomputer in the world
Oak Ridge National Labs
Cray XT5224,256 cores4352 sq. ft6.95 MegaWatts$104 million