+ All Categories
Home > Documents > The Next Generation of High Performance Computing Clusters...

The Next Generation of High Performance Computing Clusters...

Date post: 30-Apr-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
1
The Next Generation of High Performance Computing Clusters Using Containerization Anna R. Prins a,b,c , Zachary D. Snoek a,b , Aaron A. Best a , and Brent P. Krueger b *, a Department of Biology, b Department of Chemistry, c Computing and Information Technology, Hope College, Holland, MI 49423. Hope College has two supercomputer clusters, curie and mu3c, that are used regularly by faculty and students throughout the sciences. Though the clusters work well overall, dealing with the dependencies inherent in an array of scientific computing software has created problems for both the users and the system administrators. To overcome these issues, we are exploring how to implement container virtualization into our clusters using Docker, an open-source container platform that provides a straightforward interface for creating and deploying containers. The knowledge gained from this project will be used to design the next generation of curie and mu3c based on containers. Using container virtualization, we have created a test cluster and have successfully run calculations with applications such as MOPAC and R. Our research has created thorough documentation about the creation and operation of a virtualized cluster. Abstract Future Work Finish documenting our work through Dockermentation Figure out a way to handle user permissions in Docker Run more benchmarking Begin installing Docker on cluster compute nodes Implement cluster management that understands containers Containers within traditional cluster job scheduler Docker swarm mode within an HPC cluster Benchmarking Dockermentation Acknowledgements Acknowledgements: Hope College Department of Chemistry Dow Foundation Computing and Information Technology at Hope College Jeff Petsun, Brian Slenk, Dean Thayer, Dan Yonker Nathan Vance and Dan Clark Open source Docker Community Edition is free to use. User friendly The Docker Engine can be used with hardly any prior knowledge of software development. Widely adopted Support from cloud computing services Why Docker? Output Current Cluster Design Compute Node Compute Node Compute Node Head Node TORQUE NFS Job scheduling curie and mu3c curie mu3c The curie and mu3c clusters are currently set up as one head node, one storage node, and multiple other computational nodes. When a user logs into one of the clusters, they submit a job from the head node. The job scheduler on the head node (TORQUE) then decides which node to send the job to. Once the job is complete, the output is sent back to the head node where it can be viewed by the user. Head Node Job Scheduler (TORQUE) Job Scheduler (TORQUE) Output Compute Node Compute Node Compute Node Throughout the duration of our research, we have been creating detailed documentation of everything we have learned to do. This Dockermentation then undergoes a three- step editing process before it is approved. Each Dockerment consists of a header listing the authors and date of creation and latest revision, a purpose statement explaining the intended use for the document, and a clear step-by-step walkthrough with an accompanying explanation of how to complete the objective in the purpose statement. Some work-in-progress documents also include notes at the bottom explaining what problems we are having and any suggestions we have to fix them. All documents are stored in a Google Drive folder where we can collaboratively edit them along with CIT staff and our research advisors. Example of a typical “Dockerment” Container The current clusters consist of: One head node: schedules jobs and sends them to the compute nodes One storage node: stores user data and applications; serves all files to all nodes using NFS Many compute nodes: run jobs R MOPAC AMBER QCHEM G09 MOLPRO Host OS Cluster Limitations Throughout the cluster, all of the programs are installed on the same host operating system. Every application has a unique set of dependencies which sometimes conflict with the dependencies of other applications. These conflicts make installing new software or updating old software very difficult for the system administrators. Docker is a container management platform. Containers are portable virtual computers with very minimal operating systems. These operating systems include only the packages needed to run the container’s software. This way, the software will always run exactly the same regardless of where the container is being run. What is Docker? Compute Node Hardware Host OS Docker Container OS (Debian) R Container OS (Ubuntu) MOPAC Container OS (CentOS) AMBER Python 2.7 Reads ~100 MB of data Parses Appends subset of data to existing file R 3.4 Loads taxonomic database Collects statistics on depth of sequencing reads Generates histogram of read depth data BASH shell script Sleep command with several time values Time in excess of sleep time is plotted Curie: command run directly on hardware OS Docker: execution time of sleep command within container Docker + SS: execution time and container start/stop time Storage Node NFS MOPAC QCHEM G09 R User Data
Transcript
Page 1: The Next Generation of High Performance Computing Clusters ...discus/muccc/muccc30/MUCCC30-Snoek.pdf · step-by-step walkthrough with an accompanying explanation of how to complete

The Next Generation of High Performance Computing Clusters Using Containerization Anna R. Prinsa,b,c, Zachary D. Snoeka,b, Aaron A. Besta, and Brent P. Kruegerb*, aDepartment of Biology, bDepartment of Chemistry, cComputing and Information Technology, Hope College, Holland, MI 49423.

Hope College has two supercomputer clusters, curie and mu3c, that are used regularly by faculty and students throughout the sciences. Though the clusters work well overall, dealing with the dependencies inherent in an array of scientific computing software has created problems for both the users and the system administrators. To overcome these issues, we are exploring how to implement container virtualization into our clusters using Docker, an open-source container platform that provides a straightforward interface for creating and deploying containers. The knowledge gained from this project will be used to design the next generation of curie and mu3c based on containers. Using container virtualization, we have created a test cluster and have successfully run calculations with applications such as MOPAC and R. Our research has created thorough documentation about the creation and operation of a virtualized cluster.

Abstract

Future Work

• Finish documenting our work through Dockermentation

• Figure out a way to handle user permissions in Docker

• Run more benchmarking

• Begin installing Docker on cluster compute nodes

• Implement cluster management that understands containers • Containers within traditional cluster job scheduler • Docker swarm mode within an HPC cluster

Benchmarking

Dockermentation

Acknowledgements Acknowledgements: Hope College Department of Chemistry Dow Foundation Computing and Information Technology at Hope College Jeff Petsun, Brian Slenk, Dean Thayer, Dan Yonker Nathan Vance and Dan Clark

• Open source • Docker Community Edition is free to use.

• User friendly • The Docker Engine can be used with hardly any prior

knowledge of software development. • Widely adopted

• Support from cloud computing services

Why Docker?

Output

Current Cluster Design

Compute Node Compute Node Compute Node

Head Node TORQUE

NFS Job scheduling

curie and mu3c

curie mu3c

The curie and mu3c clusters are currently set up as one head node, one storage node, and multiple other computational nodes. When a user logs into one of the clusters, they submit a job from the head node. The job scheduler on the head node (TORQUE) then decides which node to send the job to. Once the job is complete, the output is sent back to the head node where it can be viewed by the user.

Head Node

Job Scheduler (TORQUE)

Job Scheduler (TORQUE)

Output

Compute Node Compute Node Compute Node

Throughout the duration of our research, we have been creating detailed documentation of everything we have learned to do. This Dockermentation then undergoes a three-step editing process before it is approved.

Each Dockerment consists of a header listing the authors and date of creation and latest revision, a purpose statement explaining the intended use for the document, and a clear step-by-step walkthrough with an accompanying explanation of how to complete the objective in the purpose statement. Some work-in-progress documents also include notes at the bottom explaining what problems we are having and any suggestions we have to fix them. All documents are stored in a Google Drive folder where we can collaboratively edit them along with CIT staff and our research advisors.

Example of a typical “Dockerment”

Container

The current clusters consist of:

• One head node: schedules jobs and sends them to the compute nodes

• One storage node: stores user data and applications; serves all files to all nodes using NFS

• Many compute nodes: run jobs

R MOPAC AMBER QCHEM

G09 MOLPRO Host OS

Cluster Limitations

Throughout the cluster, all of the programs are installed on the same host operating system. Every application has a unique set of dependencies which sometimes conflict with the dependencies of other applications. These conflicts make installing new software or updating old software very difficult for the system administrators.

Docker is a container management platform. Containers are portable virtual computers with very minimal operating systems. These operating systems include only the packages needed to run the container’s software. This way, the software will always run exactly the same regardless of where the container is being run.

What is Docker?

Compute Node Hardware

Host OS

Docker

Container OS (Debian)

R

Container OS (Ubuntu)

MOPAC

Container OS (CentOS)

AMBER

• Python 2.7 • Reads ~100 MB of

data • Parses • Appends subset of

data to existing file

• R 3.4 • Loads taxonomic

database • Collects statistics on

depth of sequencing reads

• Generates histogram of read depth data

• BASH shell script • Sleep command with

several time values • Time in excess of sleep

time is plotted • Curie: command run

directly on hardware OS

• Docker: execution time of sleep command within container

• Docker + SS: execution time and container start/stop time

Storage Node NFS

MOPAC QCHEM G09

R User Data

Recommended