+ All Categories
Home > Documents > Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email]...

Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email]...

Date post: 27-Sep-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
30
Introduction to BioHPC New User Training 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] [email protected]
Transcript
Page 1: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Introduction to BioHPCNew User Training

1 Updated for 2016-09-07

[web] portal.biohpc.swmed.edu

[email] [email protected]

Page 2: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Today we’re going to cover:

What is BioHPC?

How do I access BioHPC resources?

How can I be a good user? (some basic rules)

How do I get effective help?

Overview

2

If you remember only one thing….

If you have any question, ask us via [email protected]

Page 3: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

What is HPC, and why do we need it?

3

High-performance computing (HPC) is the use of parallel processing for running advanced application programs efficiently, reliably and quickly.

Any computing that isn’t possible on a standard system

PROBLEMS

Huge Datasets

Complex Algorithms

Difficult / inefficient software

BioHPC SOLUTIONS

Batch HPC jobsInteractive GUI sessionsVisualization with GPUs

Windows sessions on the clusterWide range of software

Easy web access to services

Page 4: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

What is BioHPC? - An Overview

4

BioHPC is:

A 120-node compute cluster. >3.5Petabyte (3,500 Terabytes) of storage across various systems. Large number of installed software packages. A network of thin-client and workstation machines. Cloud Services to access these facilities easily. A dedicated team to help you efficiently use these resources for your research.

Page 5: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Who is BioHPC?

5

Liqiang WangDirector, 13 years experience in IT infrastructure, HPC.

Yi DuComputational Scientist, experience in parallel software design, large-scale data analysis.

David TrudgianComputational Scientist, Ph.D. in Computer Science and 10 years experience in bioinformatics (focus on machine learning for sequence classification & computational proteomics).

Ross BatemanTechnical Support Specialist, experienced in maintaining user systems and troubleshooting.

Wei GuoComputational Scientist, Ph.D. in Materials Science and Engineering, experience in HPC for complex simulations.

Long LuComputational Scientist, MS in CS. Biology and Chemistry, Gene sequencing and materials science

We are [email protected]

https://portal.biohpc.swmed.edu/content/about/staff/

Page 6: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

What is BioHPC? – Nucleus Computer Cluster

6

Nucleus is our compute cluster

120 nodes – 128GB, 256GB, 384GB, GPU

CPU cores: 4700GPU cores: 19968Memory: 25TBNetwork: 100Gb/s core, 56Gb/s per node (internal)

40Gb/s(to campus)

Login via ssh to nucleus.biohpc.swmed.edu or use web portal.

Page 7: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Hands on BioHPC – Work around

7

Run any computationally intensive work

Linux HPC Jobs GPU Visualization

Windows with GPU VisualizationInteractive Sessions

Page 8: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

50 GB

As a BioHPC user, you will have access to:

• BioHPC Cluster/home2/username 50 GB / user/project/department/group/username 5 TB / per group*/work/department/username 5 TB / per user

• BioHPC File Exchange (web-interface)

https://cloud.biohpc.swmed.edu 50 GB / user, local storage

• BioHPC Lamella Cloud Storage (web-interface), on campus only, private cloud

https://lamella.biohpc.swmed.edu 100 GB / user, local storageGateway to BioHPC Cluster (via FTP, SAMBA or WebDAV*)

BioHPC Storage – Standard Users

8

* Can be increased on PI request with Dept. Chair approval.

Page 9: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

BioHPC Storage – Core Users

9

Some core facilities provide access to BioHPC for their users to transfer data etc.

The core decides the amount and type of storage to provide to their users, e.g.

TIBIR WBMF Core:

/project/TIBIR/WBMF_Core/<username> 250GB / core user

This is also your home directoryNo separate home2 or work space

Storage allocation and usage is at the discretion of the core, not BioHPC.

Page 10: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

What is BioHPC? – Lamella Storage Gateway

10

Lamella is our storage gateway – access your files easily

Web Interface

FTP

Windows / Mac drive mounts(SMB /WebDav)

lamella.biohpc.swmed.edu

Page 11: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

BioHPC Storage Backup

11

Since January we perform mirror back up on:

/home2Twice weekly (Mon & Wed)2 copies – home2 usage counts 3x against lab storage allocation

/workWeekly (Fri->Sat)1 copy – work usage counts 2x against lab storage allocationExcludes some large users

Mirror backupsCopy of your files at the time the backup runs. No old versions.

/projectIncremental backup of specific locations at request of PI.

Page 12: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

What is BioHPC? - Software

12

Wide range of packages available as modules.

You can ask [email protected] for additions/upgrades etc.

Page 13: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

What is BioHPC? – Thin Client & Workstation Systems

13

Desktop computers directly connected to the BioHPC systems.

Run same version of Linux as the cluster, but with a graphical desktop.

Login with BioHPC details, direct access to storage like on cluster.

Same software available as on cluster.

Will make up a distributed compute resource in future, using up to 50% of CPU to run distributed jobs.

Thin client is less powerful but cheaper and smaller.

Training: Thin Clients and Workstations Wed 09/14.

Page 14: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

What is BioHPC? – Cloud Services

14

A big focus at BioHPC is easy-access to our systems.

Our cloud services provide web-based access to resources, with only a browser.

All accessible via portal.biohpc.swmed.edu

Page 15: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

15

Okay, sounds great….

But how do I use all of this?

Page 16: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Hands on BioHPC -- 1. Portal Website Walkthrough

16

portal.biohpc.swmed.edu

Page 17: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Hands on BioHPC – 2. Manage Files with Lamella / Cloud Storage Gateway

17

Cloud storage gateway – web-based.

https://lamella.biohpc.swmed.edu100GB separate space +Mount /home / project /workInternal

https://cloud.biohpc.swmed.edu50GB spaceExternal file transferAccessible from internet

Page 18: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Hands on BioHPC – 2. Setting up Lamella to access home, project, and work space

18

https://lamella.biohpc.swmed.edu

lysosome username password homeproject

For home leave blank

For private project space:department/lab/user

For lab shared project space:department/lab/shared

BioHPC Endosome/Lysosome

Page 19: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Hands on BioHPC – 2. Accessing BioHPC Storage Directly from Windows

19

Computer -> Map Network Drive

Folder is: \\lamella.biohpc.swmed.edu\username (home dir)\\lamella.biohpc.swmed.edu\project\\lamella.biohpc.swmed.edu\work

Check ‘Connect using different credentials’

Enter your BioHPC username and password when prompted.

Page 20: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Hands on BioHPC – 2. Accessing BioHPC Storage Directly from Mac OSX

20

Finder -> Go -> Connect to Server

Folder is: smb://lamella.biohpc.swmed.edu/username (home dir)smb://lamella.biohpc.swmed.edu/projectsmb://lamella.biohpc.swmed.edu/work

Enter your BioHPC username and password when prompted.

Page 21: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Hands on BioHPC – 3. Web Job Script Generator

21

https://portal.biohpc.swmed.edu -> Cloud Services -> Web Job Submission

Page 22: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Hands on BioHPC –. 4 Web Visualization: Graphical Interactive Session via Web Portal / VNC

22

https://portal.biohpc.swmed.edu -> Cloud Services -> Web Visualization

Connects to GUI running on a cluster node. WebGPU sessions have access to GPU card for 3D rendering.

Page 23: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Hands on BioHPC – 5. Software Modules

23

module list Show loaded modulesmodule avail Show available modulesmodule load <module name> Load a modulemodule unload <module name> Unload a modulemodule help <module name> Help notes for a modulemodule –H Help for the module command

Page 24: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Hands on BioHPC – 6. SSH Cluster Login via the Web Portal

24

https://portal.biohpc.swmed.edu -> Cloud Services -> Nucleus Web Terminal

w

Connects to the login node, not a cluster node

Page 25: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Hands on BioHPC – 7. Connecting from Home

25

Windows - Follow the IR VPN instructions at:http://www.utsouthwestern.net/intranet/administration/information-resources/network/vpn/

Mac – Try the IR instructions first. If they don’t work:On Campus

Go -> Connect to ServerServer Address: smb://swnas.swmed.org/data/installsConnect

VPN Client (Juniper) -> Juniper Mac VPN Client Installer ->JunosPulse.dmg

Install the software from in the .dmg file. You cannot test it on campus.

At Home

Start Junos Pulse and add a connection to server ‘utswra.swmed.edu’

When connecting must enter a secondary password, which is obtained using the ‘key’ icon in the Duo Mobile two-factor authentication smartphone app. Or type ‘push’ to get a push notification on your phone.

We can help – surgery session, or NL05.136

Page 26: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

How To Be a Good User

26

HPC Systems are crowded, shared resources

Co-operation is necessary.

The BioHPC team has a difficult job to do:

• Balance the requirements of a diverse group of users, running very different types of jobs.

• Make sure user actions don’t adversely affect others using the systems.• Keep the environment secure.• Ensure resources are being used efficiently.

Web-based Cloud-Services are designed to avoid problems.

Page 27: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

All we ask is…

27

1. If you have any question, or are unsure about something please ask [email protected]

2. When running jobs on the cluster, request the least amount of resources you know you need.

Job times / memory limit / smallest node that will work etc.Up to a 2x margin of safety is appropriate.

3. Make reasonable attempts to use the resources efficiently.Run multiple small tasks on a node if you can.Cancel / close any jobs or sessions you no longer need.

4. Keep notes in case you need our help troubleshootingKeep old versions of scripts and job files

Page 28: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Currently Enforced Policy

28

Don’t run complex things on the login node.(web terminal or nucleus.biohpc.swmed.edu)

Maximum of 16 nodes in use concurrently by any single user. 2 GPU node max per user.

Interactive use of cluster nodes using the web visualization or remoteGUI/remoteGPU scripts only*.

You cannot SSH to a computer node not allocated to you.

Page 29: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Getting Effective Help

29

Email the ticket system: [email protected]

What is the problem?Provide any error message, and diagnostic output you have

When did it happen?What time? Cluster or client? What job id?

How did you run it?What did you run, what parameters, what do they mean?

Any unusual circumstances?Have you compiled your own software? Do you customize startup scripts?

Can we look at your scripts and data?Tell us if you are happy for us to access your scripts/data to help troubleshoot.

Page 30: Introduction to BioHPC · 1 Updated for 2016-09-07 [web] portal.biohpc.swmed.edu [email] biohpc-help@utsouthwestern.edu. ... is the use of parallel processing for running advanced

Next Steps

30

• New users – wait for confirmation your account is activated.

• Spend some time experimenting with our systems, read the guides.

• Check the training schedule and attend relevant sessions.

• Join us for coffee – now part of 2nd Wednesday training sessions.


Recommended