R for HPCHigh Performance Computing Center North (HPC2N)
P. Ojeda-May
Core1 Core2 Core3 Core4
Core5 Core6 Core7 Core8
Core9 Core10 Core11 Core12
Core13 Core14 Core15 Core16
Core17 Core18 Core19 Core20
Core21 Core22 Core23 Core24
Core25 Core26 Core27 Core28
Core1 Core2
1 Node of Kebnekaise cluster
My laptop
Random Forest
Decision TreesAnimal Classification
Has it feathers?
Can it fly?Has it fins?
Hawk Penguin
Dolphin Bear
Root node
Decision node
Leaf (terminal) node
yes
yes
yes
no
no
no
Decision Trees
Bagging = Booststrap aggregating
Training Data
Bootstrapped subset 1
Bootstrapped subset 2
Bootstrapped subset 3
Decision Trees
Random Forest (RF)
Besides bagging, RFs use random feature selection.
Model evaluationK-fold Cross Validation
k=4
Using R on KebnekaisePedro Ojeda
>>ml spider R--------------------------------------------------------------------------------------------R:
--------------------------------------------------------------------------------------------Description:R is a free software environment for statistical computing and graphics.
Versions:R/3.3.1R/3.4.4-X11-20180131R/3.5.1-Python-2.7.15R/3.5.1R/3.6.0
Using R at HPC2N1. Write your R script (single core):
Filename: hello.R
2. Write your batch script:
Filename: job.sh
print(“Hello World”)
#!/bin/bash#SBATCH -A SNIC2019-5-156#Asking for 10 min.#SBATCH -t 00:03:00#SBATCH -c 1
#ml GCC/8.2.0-2.31.1 OpenMPI/3.1.3#ml R/3.6.0ml GCC/7.3.0-2.30 OpenMPI/3.1.1; ml R/3.5.1ml R-bundle-Bioconductor/3.8-R-3.5.1
Rscript --vanilla hello.R
Using R at HPC2N>Rscript --helpUsage: /path/to/Rscript [--options] [-e expr [-e expr2 ...] | file] [args]
--options accepted are--help Print usage and exit--version Print version and exit--verbose Print information on progress--default-packages=list
Where 'list' is a comma-separated setof package names, or 'NULL'
or options to R, in addition to --slave --no-restore, such as--save Do save workspace at the end of the session--no-environ Don't read the site and user environment files--no-site-file Don't read the site-wide Rprofile--no-init-file Don't read the user R profile--restore Do restore previously saved objects at startup--vanilla Combine --no-save, --no-restore, --no-site-file
--no-init-file and --no-environ
Using R at HPC2N
3. Transfer your files to Kebnekaise4. Submit your job with: sbatch job.sh
In case sbatch complains about the DOS format use:
dos2unix job.sh
before submitting your job.
More information: https://www.hpc2n.umu.se/resources/software/r
Using R at HPC2N
1. Write your R script (independent single core jobs):
Filename: hello.R
2. Write your batch script:
Filename: job.sh
print(“Hello World”)
#!/bin/bash#SBATCH -A SNIC2019-5-156#Asking for 10 min.#SBATCH -t 00:03:00#SBATCH -N 1#SBATCH -c 3
#ml GCC/8.2.0-2.31.1 OpenMPI/3.1.3#ml R/3.6.0ml GCC/7.3.0-2.30 OpenMPI/3.1.1; ml R/3.5.1ml R-bundle-Bioconductor/3.8-R-3.5.1
Rscript --vanilla hello.R &Rscript --vanilla hello.R &Rscript --vanilla hello.R
Using R at HPC2N
1. Write your R script (parallel jobs):
Filename: parallel.R
2. Write your batch script:
Filename: job.sh
#R code in parallel
….
#
#!/bin/bash#SBATCH -A SNIC2019-5-156#Asking for 10 min.#SBATCH -t 00:03:00#SBATCH -N 1#SBATCH -c 3
#ml GCC/8.2.0-2.31.1 OpenMPI/3.1.3#ml R/3.6.0ml GCC/7.3.0-2.30 OpenMPI/3.1.1; ml R/3.5.1ml R-bundle-Bioconductor/3.8-R-3.5.1
Rscript --vanilla parallel.R
Using R at HPC2NPackages for parallel computing:• doParallel, backend for foreach package. • Snow • Rmpi
Packages for Machine Learning with parallel computing:
• Caret, Classification And REgression Training. This package (http://topepo.github.io/caret/index.html) can assist you with the following tasks: Data partitioning and preprocessing, such as data imputation Selection of important features Automate the tuning of model parameters Simulations can be parallelized Model performance evaluation tools
Using R at HPC2NGood practices:
• Use the login nodes for lightweight tasks• Profile your code• Monitoring your job on the fly:
• If you run your script on multiple cores, you can monitor the CPU and memory usage in real time, use the following command on the terminal:
job-usage “job_ID”
Then copy and paste the URL on your local web browser.
Using R at HPC2N
References:
1. Introduction to Machine Learning 3rd Ed., Ethem Alpaydin2. Machine Learning with R, Brett Lantz3. R High Performance Programming, Aloysius Lim, et. al.4. Mastering Scientific Computing with R, Paul Gerrard, et. al. 5. Bioinformatics with R Cookbook, Paurush Praveen Sinha.6. An Introduction to Statistical Learning with Applications in R, Gareth James, et. al. 7. Art of R Programming: A Tour of Statistical Software Design, Norman Matloff.
PRACE Survey:
https://events.prace-ri.eu/event/1024/surveys/668