Post on 27-May-2020
transcript
QNIBTerminal plus InfiniBandContainerized MPI Workloads
2014-11-05Christian Kniep
Agenda• About Me • Docker in a Nutshell • QNIBTerminal • Study
• Testbed + HPCG • Results
• Future Work • Conclusion
2
• Hot topics • Containerization
• GO-Lang
• Log / Performance Management
• HPC Cluster Software Stack / Interconnect
• 1/2013 -> 10/2014 R&D @Bull SAS
• since then independent R&D / Freelancing
About Me
3
• Iteration through L1/2/3 SysOps
• Mostly german automotive sector
Docker in a Nutshell
4
• (chroot on steroids)2
• Builds on-top LinuX Containers (LXC)
• Kernel namespaces (isolation)
• cgroups (resource mgmt)
Docker in a Nutshell
5
• (chroot on steroids)2
• intuitive build system
Docker in a Nutshell
6
• (chroot on steroids)2
• Builds on-top LinuX Containers (LXC)
• Kernel namespaces (isolation)
• cgroups (resource mgmt)
• RedHat backing
• public repositories
• intuitive build system
Docker in a Nutshell
7
• (chroot on steroids)2
• Builds on-top LinuX Containers (LXC)
• Kernel namespaces (isolation)
• cgroups (resource mgmt)
Traditional vs. Lightweight Layers
8
SERVER
HOST KERNEL
HYPERVISOR
KERNEL
SERVICE
Userland (OS)
KERNEL KERNEL
Userland (OS)Userland (OS) Userland (OS)
SERVICE SERVICE
SERVER
HOST KERNEL
SERVICE
Userland (OS)
Userland (OS)Userland (OS) Userland (OS)
SERVICE SERVICE
Traditional Virtualisation Containerisation
IB
IB
Rocket ‚Docker‘• 2013-01-18 First commit • 2013-02-01 First online demo • 2013-03-21 Demo at PyCon US • 2013-03-23 Version 0.1• 2013-03-26 Repository on github.com created • 2013-04-23 Version 0.2• 2013-05-06 Version 0.3• 2013-06-03 Version 0.4• 2013-06-25 Joining Linux Foundation • 2013-07-18 Version 0.5 (top, mount) • 2013-08-23 Version 0.6 (-privileged, LXC conf) • 2013-09-09 Collaboration with Red Hat • 2013-10-29 Foundation of Docker Inc. • 2013-11-26 Version 0.7
9
• 2014-01-21 Docker Inc. raised $15M • 2014-02-04 Version 0.8 (MacOSX, btrfs exp., ONBUILD)• 2014-03-10 Version 0.9 (exec driver, libcontainer) • 2014-06-09 Version 1.0 (PROD, pause, XFS, COPY) • 2014-06-09 official repos, public repository • 2014-06-09 enterprise support, training, consulting • 2014-07-03 Version 1.1 (.dockerignore, mount „/“) • 2014-08-22 Version 1.2 (-restart, caps , rw /etc/hosts) • 2014-09-16 Docker Inc. raised $40M • 2014-09-16 Microsoft teams up with Docker Inc. • 2014-10-16 Version 1.3 (signed img, proc injection, …)
QNIBTerminalMotivation
10
Plain Metrics
QNIBTerminalMotivation
11
Plain Log Events
QNIBTerminalMotivation
12
Overlap Metrics/Log Events
QNIBTerminal Overview
13
haproxy haproxy
dnshelixdns
elk
kibana
logstash
etcd
carboncarbon
graphite-webgraphite-web
graphite-apigraphite-api
grafanagrafana
slurmctldslurmctld
compute0slurmd
compute<N>slurmd
Log/Events
Services Performance
Compute
elasticsearch
One Node Setup• All network traffic over bridge• Crippled MPI workload
• Multiple Open MPI version installed
• gcc versions
• 3 containers on top (CentOS 6, CentOS 7, Ubuntu 12)
• SLURM Resource Scheduler
• 1 native partition
• 3 containers partitions
Testbed
14
• 8 nodes (CentOS 7, 2x 4core XEON, 32GB, Mellanox ConnectX-2)
HPCG Benchmark• mimics thermodynamic application workload
• Linpack corrective / successor in the long-term?
15
Resultspartition’s performance
16
GFL
OP/
s
3
3,75
4,5
5,25
6
native cos7 cos6 u12
CentOS 7.0 oMPI 1.6.4 gcc 4.8.2
Resultspartition’s performance
17
GFL
OP/
s
3
3,75
4,5
5,25
6
native cos7 cos6 u12
CentOS 7.0 oMPI 1.6.4 gcc 4.8.2
CentOS 6.5 oMPI 1.5.4 gcc 4.4.7
Ubuntu12.04 oMPI 1.5.4 gcc 4.6.3
Resultsmultiple MPI versions
18
GFL
OP/
s
3
3,75
4,5
5,25
6
distribution
nativecos7cos6u12
oMPI 1.6.4
oMPI 1.6.4
oMPI 1.5.4
oMPI 1.5.4
Resultsmultiple MPI versions
19
GFL
OP/
s
3
3,75
4,5
5,25
6
distribution 1.6.4 1.8.4
nativecos7cos6u12
oMPI 1.6.4
oMPI 1.6.4
oMPI 1.5.4
oMPI 1.5.4
gcc 4.8.2gcc 4.8.2gcc 4.4.7gcc 4.6.3
Resultsmultiple MPI versions
20
GFL
OP/
s
3
3,75
4,5
5,25
6
distribution 1.5.4 1.6.4 1.8.4
nativecos7cos6u12
oMPI 1.6.4
oMPI 1.6.4
oMPI 1.5.4
oMPI 1.5.4
gcc 4.8.2gcc 4.8.2gcc 4.4.7gcc 4.6.3
• Security evaluations
• Compare different frameworks to orchestrate
• Use of SV-IOR (Keynote earlier today)
• Compare with tuned bare-metal
• Tune docker installation
Future Work
21
• Benchmark real-world applications
• Out-of-the-box: container beats bare-metal
• Continuous testing/deployment of containerized workloads
• Bare-metal kernel provides access to IB
• Container in charge from MPI upwards
Conclusion
22
• Bunch of tooling within docker ecosystem
• Abstraction bare-metal / application works fine
• Low performance overhead
• Contact • @CQnib / @qnibinc • christian@qnib.org • http://qnib.org
La Fin
23
https://www.flickr.com/photos/dharmabum1964/3108162671
• Paper: http://doc.qnib.org/
• Contact • @CQnib / @qnibinc • christian@qnib.org • http://qnib.org
La Fin
24
https://www.flickr.com/photos/dharmabum1964/3108162671
La Fin
25
https://www.flickr.com/photos/dharmabum1964/3108162671
• Interested? • Docker Pitch today • Internal Evaluations • Workshops / Talks
• Paper: http://doc.qnib.org/
• Contact • @CQnib / @qnibinc • christian@qnib.org • http://qnib.org
La Fin
26
https://www.flickr.com/photos/dharmabum1964/3108162671
• Interested? • Docker Pitch today • Internal Evaluations • Workshops / Talks
• Questions?
• Paper: http://doc.qnib.org/
• Contact • @CQnib / @qnibinc • christian@qnib.org • http://qnib.org