Introduction to the Shared Compute Cluster Charles Jahnke (cjahnke@bu.edu),cjahnke@bu.edu Katia...

Post on 18-Jan-2016

224 views 0 download

Tags:

transcript

Introduction to theShared Compute Cluster

Charles Jahnke (cjahnke@bu.edu), Katia Oleinik (koleinik@bu.edu)

Research Computing Services

Topics for Today

› Overview of the Shared Compute Cluster

› Connecting to the SCC

› Files/Storage on SCC

› Using Linux

› Using Software and Modules

› Using Batch System Overview

› Getting Help

IntroductionCluster overview, Architecture, and Service Models.

The Shared Compute Cluster (SCC)

› Linux compute cluster with 7000+ processors and 200+ GPU.

› Over 2 Petabytes of disk space.

› Owned by Boston University and researchers.

› Located at the Massachusetts Green High Performance Computing Center (MGHPCC) in Holyoke, MA

› Went into production in June, 2013 for Research Computing.

MGHPCC

› Collaborations between 5 universities, MA state, and industry.

› State-of-the-art data center in Holyoke, MA.

› MGHPCC provides physical infrastructure (i.e. space, power, cooling), not computing systems.

› Individual universities or consortiums provide their own computing and support.

SCC Architecture

LoginNodes

ComputeNodes

SCC1 SCC2 SCC3 SCC4

FileStorage

Public Network

Private Network

VPN only >2PB

420 nodes with 7072 CPUs and250 GPUs with

Service Models – Shared and Buy-In

Shared: Centrally funded by BU and university-wide grants. Resources are free to the entire BU Research Computing community.

Buy-In: Purchased by individual faculty or research groups through the Buy-In program with priority access for the purchaser.

~ 60% ~ 40%

Service Models are Mutually Beneficial

Buy-In Owner Usage

Other Nodes17,544,745

67%

Own Nodes8,793,012

33%

Owner Use8,793,01262%

Other Use5,328,76838%

Buy-In Node Usage

Buy-In owner is able to “burst” to shared resources when personal purchases aren’t enough.

Owner has priority on own resources. However, when the Buy-In owner is not using their nodes, they are “shared” to other researchers.

Connecting to SCCWindows, OS X, and Linux

Connection Protocols and Software

SCC supports Secure SHell (SSH) for interactive work and Secure File Transfer Protocol (SFTP) for transfer.[Other protocols too, but let’s start with these.]

› Windows: MobaXterm, PuTTY, X-Win32, …

› Mac: Terminal (Optional: XQuartz)

› Linux: Terminal (Optional: X11)

SCC Help: http://www.bu.edu/tech/support/research/system-usage/getting-started/connect-ssh

Choose a Login Node

SCC2 SCC3 SCC4

Public Network

VPN only

Login Node Hostname Description

SCC1 scc1.bu.edu General purpose login node, accessible from internet

SCC2 scc2.bu.edu General purpose login node, accessible from internet

SCC3 geo.bu.edu Earth and Environment department node.

SCC4 scc4.bu.edu BUMC login node. Access to /restricted/project data. Requires BU network or VPN.

SCC1 SCC2 SCC3 SCC4

Public Network

VPN only

Use SSH and your username to log into any login node.

› Similar for Linux, Mac, and Windows (MobaXterm)

SSH - Login to the SCC

SCC Help: http://www.bu.edu/tech/support/research/system-usage/getting-started/connect-ssh

[local_prompt]$ ssh username@scc4.bu.eduusername@scc4.bu.edu’s Password:

The same process with an option to enable X-Forwarding

› Linux and Windows (MobaXterm)

› Apple OS X

SSH - Login to the SCC (X-Forwarding)

SCC Help: http://www.bu.edu/tech/support/research/system-usage/getting-started/connect-ssh

[local_prompt]$ ssh -X username@scc4.bu.eduusername@scc4.bu.edu’s Password: [username@scc4 ~]$ xclock &

[local_prompt]$ ssh -Y username@scc4.bu.eduusername@scc4.bu.edu’s Password: [username@scc4 ~]$ xclock &

Don’t see a clock?› Make sure X-Forwarding is enabled with ssh (-X or -Y)

› Make sure XWin\xQuartz\X11 is installed on your local system.

local

Data Transfer - Secure Copy (scp)› Like 'ssh' + 'cp'

› Less efficient than 'rsync', but very easy to use

› From desktop to SCC:

local_prompt% scp localfile john@scc1.bu.edu/path/on/scc

Password: ******

› From SCC to desktop:

local_prompt% scp john@scc1.bu.edu//path/on/scc localdest

Password: ******

SCC Help: http://www.bu.edu/tech/support/research/system-usage/getting-started/get-started-file-transfer/

local

Data Transfer - Remote Sync (rsync)› From desktop to SCC (run on desktop):

local_prompt% rsync –a localfile john@scc1.bu.edu/path/on/scc

Password: *****

› From SCC to desktop (run on desktop):local_prompt% rsync –a john@scc1.bu.edu//path/on/scc localdestPassword: ******

› See the manual for many very useful options (‘man rsync’).

SCC Help: http://www.bu.edu/tech/support/research/system-usage/getting-started/get-started-file-transfer/

local

Data Transfer - Secure File Transport Protocol (SFTP)

Available in command line tools.Popular for graphical interfaces.

› Windows: MobaXterm, FileZilla

› Mac: Fetch (BU License), FileZilla

› Linux: FileZilla, others.

SCC Help: http://www.bu.edu/tech/support/research/system-usage/getting-started/get-started-file-transfer/

File Storage on SCC

Storage Options on SCC

› Unmanaged Storage – well, you manage it

› Managed Storage – several options

Name Redundant Snapshots Offsite Backup Allocation [Max]

Home Directory Yes 180 Days To Campus 10

/project Yes 180 Days To Campus 50 [200]

/projectnb Yes 30 Days None 50 [800] ($$*)

Archive Yes -- -- $$

STASH Yes 30 Days Only Option $$

* Max/project, with a “it’s complicated” explanation if you have multiple projects

SCC Help: http://www.bu.edu/tech/support/research/computing-resources/file-storage/

Home directories and Project(nb) space

Restricted Data

Some data requires dbGaP compliance or other restrictions.

› Policies for “project” and “projectnb” in previous slides is replicated for the /restricted filesystem.

› Only accessible through scc4.bu.edu and compute nodes

› This is also available as a “STASH” allocation.

Restricted Space Description

/restricted/project/ /project/ space equivalent for restricted data

/restricted/projectnb/ /projectnb/ space equivalent for restricted data

Scratch Space› Home and project spaces are on servers

accessed using very fast networks.

› Each node (login or compute) has a directory called /scratch on a local hard drive.

› In a batch job, the $TMPDIR environment variable refers a job specific directory in scratch space.

–This is deleted at the end of the job.

› Scratch files are kept for 30 days, with no guarantees.

SCC Help: http://www.bu.edu/tech/support/research/system-usage/running-jobs/resources-jobs/local_scratch/

Private Network

/scratch /project

Snapshots

› Nightly copies of files stored within the file system.

› Allows for retrieval of files that are accidentally deleted.

› Every directory contains a hidden .snapshots directory.

–It is not visible in an ls -a listing, but you can cd into it:

SCC Help: http://www.bu.edu/tech/support/research/computing-resources/file-storage/#Snapshots

[cjahnke@scc2 ~]$ ls -a .Xauthority .bashrc normal_folder other_folder [cjahnke@scc2 ~]$ ls .snapshots150329 150328 150327 150326 ...

These are folders representing days in YYMMDD format

Using the System (Part 1)Basic Linux commands and system use

Using the System: Commands

› The following slides contain snippets from the “SCC Getting Started” pdf.

› We will skim this, you have the handout.

› http://scv.bu.edu/documents/SCC_GettingStarted.pdf

Commands: User’s Info

Commands: Directory Navigation

› http://scv.bu.edu/documents/SCC_GettingStarted.pdf

Commands: File Management

Commands: Search

› http://scv.bu.edu/documents/SCC_GettingStarted.pdf

Commands: File Viewing and Editing

Commands: File Compression

› http://scv.bu.edu/documents/SCC_GettingStarted.pdf

Commands: Uploading and Downloading

› http://scv.bu.edu/documents/SCC_GettingStarted.pdf

Using the System (Part 2)Users, Groups, and File Ownership

Users, Groups, and File Ownership

› Every file has an owner› Every file belongs to a group› Every file has “permissions”› The owner (a user) can modify permissions.

› There are many users on the system.

› Users can belong to multiple groups.

File Access and Permissions

› Types of Access Rights– Read access “r”– Write access “w”– Execute rights “x”

› Types of Access Levels– User (owner) “u”– Group “g”– Others “o”

[cjahnke@scc4 ~]$ ls -l /project/labunskyy/drwxr-x--- 2 vlabuns labunskyy 512 Jul 9 12:00 amytsdrwxr-x--- 2 cbeau labunskyy 512 Jul 9 12:00 beverlyn

Group NameOwner username“Other” permission, last three bits“Group” Permission, middle three bits“Owner” permission, first three bitsType of file (d=directory)

* The “S” you see as a group attribute is called a setgid bit. It gives special attributes to the child files/folders. In this case, you can ignore it.

Changing Ownership

›chown - Change file owner and group

–chown [OPTION]... [OWNER][:[GROUP]] FILE...

–Must have write access to file to make changes.

›Change user ownership of individual file

› Change user and group ownership of file

› Change ownership of all contents in a directory (-R = recursive)

See the manual for full description (‘man chown’).

scc2% chown cjahnke testfile.txt

scc2% chown cjahnke:keplab testfile.txt

scc2% chown -R cjahnke:keplab testdirectory

Changing Permissions

›chmod - Change file mode bits–chmod [OPTION]... MODE[,MODE]... FILE...–Mode has 2 formats:

› Octal: base-8 bit representation

› Symbolic: u/g/o, r/w/x, and +/-/= define permissions

See the manual for full description (‘man chmod’).

scc2% chmod 750 testfile.txt

scc2% chmod u+rwx,g+rx,o-r testfile.txt

Permission Mode Meanings

http://en.wikipedia.org/wiki/File_system_permissions

Symbolic Notation Octal Notation English (user, group and others have…)

--- 0 no permissions

--x 1 execute

-w- 2 write

-wx 3 write & execute

r-- 4 read

r-x 5 read & execute

rw- 6 read & write

rwx 7 read, write, & execute

Access Rights Examples

Permissions Description

-rw-r--r-- Readable and writable for file owner, only readable for others.(Standard)

-rw-r----- Readable and writable for file owner, only readable for users belonging to the file group.

drwx------ Directory only accessible by its owner.

-------r-x File executable by others but neither by your friends nor by yourself.Nice protections for a trap.

Access Right Constraints

› x is sufficient to execute binaries

– Both x and r and required for shell scripts.

› Both r and x permissions needed in practice for directories:

– r to list the contents

– x to access the contents.

› You cannot rename/remove/copy files in a directory without w access to directory.

› If you have w access to a directory, you CAN remove a file even if you don't have write access to this file (remember that a directory is just a file describing a list of files). This even lets you modify (remove + recreate) a file even without x access to it.

Using the System (Part 3)Some basic tools, utilities, and methods

Text Editors

nano - “Nano's ANOther” editoremacs - Programming Editorvim / vi - Visual IMproved Others

› gedit - Gnome EDITor› emacs - Programming Editor› gvim - GUI VIM› Others

Command Interface Graphical Interface

Word Count (wc)

scc2% cat testfile.txt0000 no permissions0111 execute0222 write0333 write & execute0444 read0555 read & execute0666 read & write0777 read, write, & execute

See the manual for full description (‘man wc’).

# All Informationscc2% wc testfile.txt 8 26 141 testfile.txt

# Number of Linesscc2% wc -l testfile.txt 8 testfile.txt

# Number of wordsscc2% wc -w testfile.txt 26 testfile.txt

UsageTestfile

Cut and remove portions of lines (cut)

scc2% cat testfile.txt0000 no permissions0111 execute0222 write0333 write & execute0444 read0555 read & execute0666 read & write0777 read, write, & execute

# Cut first column scc2% cut -f1 testfile.txt 00000111022203330444055506660777# NOTE: The default delimiter is TAB. Use “-d” to specify another.

UsageTestfile

Sort (sort) and Unique (uniq)

Testfile

scc2% cat testfile.txtreadwritewrite & executereadwrite & executeread & write

Usage

scc2% sort testfile2.txt readreadread & writewritewrite & executewrite & executescc2% sort testfile2.txt | uniqreadread & writewritewrite & executeModified for example

Redirection

› The “>” symbol redirects the output of a command to a file.

› Example:

scc2% sort testfile.txt > testfile_sorted.txt

Redirection Description

< Input - Directs a file

<< Input - Directs a stream literal

<<< Input - Directs a string

> Output - Writes output to file (will “clobber”)

>> Output - Appends output to file

Pipes

› Pipes (“|”) redirect the standard output of a command to the standard input of another command.

› Example:

cat testfile.txt | sort | uniq

readwritewrite & executereadwrite & executeread & write

readreadread & writewritewrite & executewrite & execute

readread & writewritewrite & execute

scc2% cat testfile.txt | sort | uniq

› Windows and Linux define “end of line” differently– Windows: “\r\n” (“^M”)– Linux: “\n”

› dos2unix - DOS to UNIX text file format converter

http://linuxcommand.org/man_pages/dos2unix1.html

dos2unix / unix2dos

scc2% dos2unix input.txt // convert and

replace input.txt

scc2% dos2unix input.txt output.txt // write output

to new file.

Measuring Disk Space› Measure disk space of specific file or folder.

› Measure disk space of files in folder

› Measure disk space of project

[cjahnke@scc4 class1]$ ls2015reporting testfile[cjahnke@scc4 class1]$ du -h 2015reporting

[File] 103M 2015reporting[cjahnke@scc4 class1]$ du -hs .

[Folder]512M .

[cjahnke@scc4 class1]$ ls -lh .-rw-r--r-- 1 cjahnke bs859 103M Jan 8 15:12 2015reporting-rw-r--r-- 1 cjahnke bs859 410M Jan 8 15:55 testfile

[cjahnke@scc4 class1]$ pquotabs859 quota usage usageproject space (GB) (GB) (files)----------------------------------- ------ --------- --------/project/bs859 200 174.03 14895/projectnb/bs859 500 0.00 1

Snapshots

› Nightly copies of files stored within the file system.

› Allows for retrieval of files that are accidentally deleted.

› Every directory contains a hidden .snapshots directory.

–It is not visible in an ls -a listing, but you can cd into it:

SCC Help: http://www.bu.edu/tech/support/research/computing-resources/file-storage/#Snapshots

[cjahnke@scc2 ~]$ ls -a .Xauthority .bashrc Downloads other_folder [cjahnke@scc2 ~]$ ls .snapshots150329 150328 150327 150326 ...

Software and Modules

Software (without modules)

› Many tools and utilities are available from the basic system environment.

› Some big-name software applications are too:– MATLAB– SAS– STATA

› Others require Modules

SCC Help: http://www.bu.edu/tech/support/research/software-and-programming/software-and-applications/modules/

Modules

› Modules allow users to access non-standard tools or alternative versions of standard packages.

› This is also an alternative way to configure your environment as required for certain packages.

› Most software packages on SCC are configured this way.

SCC Help: http://www.bu.edu/tech/support/research/software-and-programming/software-and-applications/modules/

Module Usage

Command Description

module list List currently loaded modules.

module avail List available modules.

module help [modulefile] Displays description of specified module.

module show [modulefile] Displays environment modifications for specified module.

module load [modulefile] Loads specified module into environment.

module unload [modulefile] Unloads specified module from environment.

module purge Unloads all loaded modules.

SCC Help: http://www.bu.edu/tech/support/research/software-and-programming/software-and-applications/modules/

New Applications and Requests

› New packages are developed every day.

› Users can compile/install packages for personal use in home directories and project spaces.

› Users can request global installation of software:– Complete form on our website (Link below)– Send an email to help@scc.bu.edu

SCC Help: http://www.bu.edu/tech/support/research/software-and-programming/software-and-applications/request-software

The Batch SystemSubmitting and Monitoring Batch Jobshttp://www.bu.edu/tech/support/research/system-usage/running-jobs

Batch System Overview

› Login nodes are busy!–Limited resource–Limited runtime (15 min)

› Compute Nodes provide reserved resources– Many more nodes– Many types of resources

› “Fair Share” scheduling

SCC Help: http://www.bu.edu/tech/support/research/system-usage/running-jobs

Compute Nodes

Login Nodes

Public Network

SCC1 SCC2 SCC3 SCC4

Private Network

Types of Jobs

› Interactive–Just like the login node–Can type, view output, open files, run commands–“Interactive”

› Non-Interactive “Batch”– Instructions coordinated with a script or binary – Easy to run 1000’s at a time.– “Blind”

SCC Help: http://www.bu.edu/tech/support/research/system-usage/running-jobs

Submitting an Interactive Batch Job

SCC Help: http://www.bu.edu/tech/support/research/system-usage/running-jobs/interactive-jobs/

Interactive jobs are submitted with one of the following commands: ‘qrsh’ or ‘qsh’ or ‘qlogin’

› The general form is:

› In Practice

[scc]$ qrsh [options] [ command [ command_arguments ] ]

[cjahnke@scc1 ~]$ qrsh -P labunskyy******************************************************************************** This machine is governed by the University policy on ethics. http://www.bu.edu/tech/about/policies/computing-ethics/

This machine is owned and administered by Boston University.

See the Research Computing web site for more information about our facilities. http://www.bu.edu/tech/about/research/

Please send questions and report problems to "help@scc.bu.edu".

********************************************************************************[cjahnke@scc-pi4 ~]$

Submitting a Non-Interactive Batch Job

SCC Help: http://www.bu.edu/tech/support/research/system-usage/running-jobs/submitting-jobs/

Non-interactive jobs are submitted with ‘qsub’ command.

› The general form is:

› In Practice: Running a Script

› In Practice: Running a Binary (-b y)

scc % qsub [options] command [arguments]

scc % qsub -P labunskyy script.qsubYour job #jobID ("jobname") has been submitted

scc % qsub -P labunskyy -b y printenvYour job #jobID ("printenv") has been submitted

Submitting a Batch Job Script

SCC Help: http://www.bu.edu/tech/support/research/system-usage/running-jobs/submitting-jobs/

Often a job is submitted using a batch script:

Example of running a binary program using qsub:

#!/bin/bash -l

#$ -P keplab#$ -N jobname#$ -j y#$ -o jobname_output.txt#$ -m be#$ -M cjahnke@bu.edu

# Run linux/application commands for jobmodule load Rcd /path/to/your/filescp /source/path /target/pathRscript bf591_script.R

#Language to interpret qsub script

# Job to run the project under# Name the Job# Join stout and sterr to a single file# Name the output file # Send an email when the job begins and ends# Where to send the email

# Run linux/application commands for job# Load the R module# Use linux commands as needed# For example, move files# Run an R script

scc % qsub {options-if-not-specified-in-file} script.shYour job #jobID ("scipt.sh") has been submitted

Monitoring Running Jobs

› Use qstat to monitor the queue status–Shows all users jobs. Usually a very long ist

–The “-u [username]” option will show a single user

scc % qstat -u cjahnke job-ID prior name user state submit/start at queue slots ja-task-ID -------------------------------------------------------------------------------------------------------5186514 0.11176 tgauss cjahnke r 01/08/2015 16:06:58 p16@scc-ph7.scc.bu.edu 16 5226267 0.11176 tgauss cjahnke r 01/09/2015 12:41:42 b@scc-ba1.scc.bu.edu 16 5230108 0.11176 tgauss cjahnke r 01/10/2015 13:12:08 b@scc-bc1.scc.bu.edu 16

SCC Help: http://www.bu.edu/tech/support/research/system-usage/running-jobs/tracking-jobs/

Monitoring Completed Jobs

› Use qacct query the queue accounting system

–Many options to tailor query: user, jobid, date run, etc.

scc % qacct -o cjahnkeqname ccs-pub hostname scc-na2.scc.bu.edu ...All Jobs by owner (“-o”) and then a summary...OWNER WALLCLOCK UTIME STIME CPU MEMORY =================================================cjahnke 43201 0.016 0.013 0.029 0.000

SCC Help: http://www.bu.edu/tech/support/research/system-usage/running-jobs/tracking-jobs/

scc % qacct -j 5126219qname ccs-pub hostname scc-na2.scc.bu.edu group linga_admin owner cjahnke project adsp department defaultdepartment jobname QRLOGIN jobnumber 5126219... Many Details for job (“-j”)

Specific Job All jobs for user

A Standard Single-Processor Job

If no specific resources are requested, your job is allocated:› 1 “Slot” (Processor core, any type/architecture)

› 12 Hour Runtime› 4 GB RAM › No GPU, MPI, or Parallelization

All of these can be modified.

General Directives to the qsub command

SCC Help: http://www.bu.edu/tech/support/research/system-usage/running-jobs/submitting-jobs/

Resource Directives to the qsub command

SCC Help: http://www.bu.edu/tech/support/research/system-usage/running-jobs/submitting-jobs/

Getting Help

How to Get Help› Upcoming Tutorials:

http://www.bu.edu/tech/about/training/classroom/rcs-tutorials/

› Support Website: http://www.bu.edu/tech/support/research/

› Email (Submit a Ticket):help@scc.bu.edu

› Email Direct:cjahnke@bu.edu

[ Date TBA ] - Python for Non-programmers 1[ Date TBA ] - Python for Non-programmers 2[ Date TBA ] - Intro to Linux[ Date TBA ] - Intro to SCC[ Date TBA ] - Intro to R[ Date TBA ] - Advanced SCC Usage[ Date TBA ] - Graphics in R[ Date TBA ] - Programming in R[ Date TBA ] - R Code Optimization

Upcoming Tutorials (BUMC)

Full List for Both Campuses: http://www.bu.edu/tech/about/training/classroom/rcs-tutorials/

This semester’s tutorials are over. New dates will be scheduled next

semester.

----

Group/Lab tutorials can be scheduled for specific topics upon request.

Questions?

Graphics SolutionsX-Windows, VNC, and OpenGL

Graphics Solutions

1. X-Window System (X11, X, X-Win)Individual windows pushed from server to client

2. Virtual Network Computing (VNC)Remote Desktop-like experience.

3. OpenGLGPU enabled networked graphics

X-Window System (X11, X, X-Win)

[local_prompt]$ ssh -X username@scc4.bu.eduusername@scc4.bu.edu’s Password: [username@scc4 ~]$ xclock &[username@scc4 ~]$ xeyes &[username@scc4 ~]$ gedit &

Virtual Network Computing (VNC)

VNC Server Setup

1. Set up VNC password:- this is done only once- the password can be reset as many times as needed

VNC Server is installed only on SCC2, SCC3, and SCC4. No VNC on SCC1 !

scc2% vncpasswd

Password:

Verify:

scc2%

SCC Help: http://www.bu.edu/tech/support/research/system-usage/getting-started/remote-desktop-vnc/

VNC Server Setup

2. Start VNC server:- this needs to be done only once (unless you killed it or the system got rebooted)- you can also execute this command to get reminded on the display values

scc2% vnstart

scc2% vnstart –geometry 1920x1200

To setup a resolution of the window

SCC Help: http://www.bu.edu/tech/support/research/system-usage/getting-started/remote-desktop-vnc/

VNC Setup Tunneling

3. Tunneling is required on the SCC to insure the secure data transfer:- your VNC session will be killed unless tunneling is established- tunneling needs to be established every time you need to run VNC- the command has to be executed on the local machine!- PORT is the 4-digit number provided to you by vncstart command- Use 7777 or 7070 to avoid using the port that is in use by your local machine- Login with your kerberous password

your_local_machine% ssh koleinik@scc2.bu.edu -L 7777:localhost:PORT

SCC Help: http://www.bu.edu/tech/support/research/system-usage/getting-started/remote-desktop-vnc/

VNC Viewer Setup

4. VNC Viewer:Mac has its own built-in VNC Viewer;

For Windows we recommend "RealVNC" (you need to download VNC viewer only!)

Use your VNC password

SCC Help: http://www.bu.edu/tech/support/research/system-usage/getting-started/remote-desktop-vnc/

OpenGLUsed primarily by a fMRI imaging lab. Allows for use of GPU’s for specific applications