+ All Categories
Home > Documents > R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific...

R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific...

Date post: 15-Aug-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
18
R for HPC High Performance Computing Center North (HPC2N) P. Ojeda-May
Transcript
Page 1: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

R for HPCHigh Performance Computing Center North (HPC2N)

P. Ojeda-May

Page 2: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

Core1 Core2 Core3 Core4

Core5 Core6 Core7 Core8

Core9 Core10 Core11 Core12

Core13 Core14 Core15 Core16

Core17 Core18 Core19 Core20

Core21 Core22 Core23 Core24

Core25 Core26 Core27 Core28

Core1 Core2

1 Node of Kebnekaise cluster

My laptop

Page 3: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

Random Forest

Page 4: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

Decision TreesAnimal Classification

Has it feathers?

Can it fly?Has it fins?

Hawk Penguin

Dolphin Bear

Root node

Decision node

Leaf (terminal) node

yes

yes

yes

no

no

no

Page 5: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

Decision Trees

Bagging = Booststrap aggregating

Training Data

Bootstrapped subset 1

Bootstrapped subset 2

Bootstrapped subset 3

Page 6: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

Decision Trees

Random Forest (RF)

Besides bagging, RFs use random feature selection.

Page 7: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

Model evaluationK-fold Cross Validation

k=4

Page 8: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

Using R on KebnekaisePedro Ojeda

Page 9: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

>>ml spider R--------------------------------------------------------------------------------------------R:

--------------------------------------------------------------------------------------------Description:R is a free software environment for statistical computing and graphics.

Versions:R/3.3.1R/3.4.4-X11-20180131R/3.5.1-Python-2.7.15R/3.5.1R/3.6.0

Using R at HPC2N1. Write your R script (single core):

Filename: hello.R

2. Write your batch script:

Filename: job.sh

print(“Hello World”)

#!/bin/bash#SBATCH -A SNIC2019-5-156#Asking for 10 min.#SBATCH -t 00:03:00#SBATCH -c 1

#ml GCC/8.2.0-2.31.1 OpenMPI/3.1.3#ml R/3.6.0ml GCC/7.3.0-2.30 OpenMPI/3.1.1; ml R/3.5.1ml R-bundle-Bioconductor/3.8-R-3.5.1

Rscript --vanilla hello.R

Page 10: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

Using R at HPC2N>Rscript --helpUsage: /path/to/Rscript [--options] [-e expr [-e expr2 ...] | file] [args]

--options accepted are--help Print usage and exit--version Print version and exit--verbose Print information on progress--default-packages=list

Where 'list' is a comma-separated setof package names, or 'NULL'

or options to R, in addition to --slave --no-restore, such as--save Do save workspace at the end of the session--no-environ Don't read the site and user environment files--no-site-file Don't read the site-wide Rprofile--no-init-file Don't read the user R profile--restore Do restore previously saved objects at startup--vanilla Combine --no-save, --no-restore, --no-site-file

--no-init-file and --no-environ

Page 11: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

Using R at HPC2N

3. Transfer your files to Kebnekaise4. Submit your job with: sbatch job.sh

In case sbatch complains about the DOS format use:

dos2unix job.sh

before submitting your job.

More information: https://www.hpc2n.umu.se/resources/software/r

Page 12: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

Using R at HPC2N

1. Write your R script (independent single core jobs):

Filename: hello.R

2. Write your batch script:

Filename: job.sh

print(“Hello World”)

#!/bin/bash#SBATCH -A SNIC2019-5-156#Asking for 10 min.#SBATCH -t 00:03:00#SBATCH -N 1#SBATCH -c 3

#ml GCC/8.2.0-2.31.1 OpenMPI/3.1.3#ml R/3.6.0ml GCC/7.3.0-2.30 OpenMPI/3.1.1; ml R/3.5.1ml R-bundle-Bioconductor/3.8-R-3.5.1

Rscript --vanilla hello.R &Rscript --vanilla hello.R &Rscript --vanilla hello.R

Page 13: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

Using R at HPC2N

1. Write your R script (parallel jobs):

Filename: parallel.R

2. Write your batch script:

Filename: job.sh

#R code in parallel

….

#

#!/bin/bash#SBATCH -A SNIC2019-5-156#Asking for 10 min.#SBATCH -t 00:03:00#SBATCH -N 1#SBATCH -c 3

#ml GCC/8.2.0-2.31.1 OpenMPI/3.1.3#ml R/3.6.0ml GCC/7.3.0-2.30 OpenMPI/3.1.1; ml R/3.5.1ml R-bundle-Bioconductor/3.8-R-3.5.1

Rscript --vanilla parallel.R

Page 14: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

Using R at HPC2NPackages for parallel computing:• doParallel, backend for foreach package. • Snow • Rmpi

Packages for Machine Learning with parallel computing:

• Caret, Classification And REgression Training. This package (http://topepo.github.io/caret/index.html) can assist you with the following tasks: Data partitioning and preprocessing, such as data imputation Selection of important features Automate the tuning of model parameters Simulations can be parallelized Model performance evaluation tools

Page 15: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

Using R at HPC2NGood practices:

• Use the login nodes for lightweight tasks• Profile your code• Monitoring your job on the fly:

• If you run your script on multiple cores, you can monitor the CPU and memory usage in real time, use the following command on the terminal:

job-usage “job_ID”

Then copy and paste the URL on your local web browser.

Page 16: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

Using R at HPC2N

Page 17: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

References:

1. Introduction to Machine Learning 3rd Ed., Ethem Alpaydin2. Machine Learning with R, Brett Lantz3. R High Performance Programming, Aloysius Lim, et. al.4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen Sinha.6. An Introduction to Statistical Learning with Applications in R, Gareth James, et. al. 7. Art of R Programming: A Tour of Statistical Software Design, Norman Matloff.

Page 18: R for HPC · 3. R High Performance Programming, Aloysius Lim, et. al. 4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen

PRACE Survey:

https://events.prace-ri.eu/event/1024/surveys/668


Recommended