Advanced Research Compu2ng
Before We Start • Sign in • If you would like to play with the examples (not required), see the SNOW Exercises sheet. – Follow the instruc2ons from Page 1 – See also the Basic Linux Commands handout
Parallelizing R with the Snow Library Advanced Research Computing April 15, 2014
Advanced Research Compu2ng
Outline
• Introduc2on • Snow Basics • Examples • Conclusions
Advanced Research Compu2ng 4
INTRODUCTION
Advanced Research Compu2ng
R
• Programming language and environment for sta2s2cal compu2ng
• Free • Intrinsic support for wide array of sta2s2cal func2onality
• Huge number of user-‐created packages to add or improve func2onality
Advanced Research Compu2ng
An Aside: Op2mizing R
• Pre-‐allocate Variables • Vectorize (or perhaps apply func2ons)
– Yes: z = x * y – No:
for (i in 1:length(x)) { z[i] = x[i] * y[i] }
• Reference: The R Inferno hTp://www.burns-‐stat.com/documents/books/the-‐r-‐inferno/
Advanced Research Compu2ng
An Aside: Op2mizing R (con2nued) • Many R opera2ons use Basic Linear Algebra Subrou2nes (BLAS)
• Build R with op2mized BLAS è op2mized R
0"
50"
100"
150"
200"
250"
300"
gcc" Intel"
Run$Time$(s)$
Run$Time$for$R$2.5$Benchmark$by$Build$Type$
Standard"
Op4mized"BLAS"
Advanced Research Compu2ng
The Need for Parallelism
Advanced Research Compu2ng 9
SNOW
Advanced Research Compu2ng
Snow Basics
• Simple Network of Worksta2ons (SNOW) • For embarrassingly parallel tasks • Master/Slave model: $ ps -u jkrometi -o cmd | grep R
R -f time_mh.r --restore –no-save
R –slave <etc>
R –slave <etc>
Advanced Research Compu2ng
Snow: Start/Stop Cluster
• Load libraries: library(snow) library(Rmpi)
• Start a cluster with ncores cores: cl <- makeCluster(ncores, type = 'MPI')
• Ini2alize random number generator: clusterSetupRNG(cl, type = 'RNGstream')
• Stop the cluster (important): stopCluster(cl)
Advanced Research Compu2ng
Snow: Compu2ng
• Call same func2on across cluster (ncores 2mes): clusterCall(cl, fun, ...)
• Parallel versions of apply: clusterApply(cl, x, fun, ...)
parApply(cl, X, MARGIN, FUN, ...) parLapply(cl, x, fun, ...)
parRapply(cl, x, fun, ...) parCapply(cl, x, fun, ...)
Advanced Research Compu2ng 13
EXAMPLES
Advanced Research Compu2ng
Monte Carlo: Calcula2ng π
• The ra2o of the area of the unit circle to the area of the unit square is
• So: – Randomly pick S points in the unit square – Count the number in the unit circle (C) – Then
π4
π ≈ 4CS
Advanced Research Compu2ng
MC π: Code mcpi <- function(n.pts) { #generate n.pts (x,y) points in the unit square
m = matrix(runif(2*n.pts),n.pts,2)
#determine if they are in the unit circle
in.ucir = function(x) {as.integer((x[1]^2 + x[2]^2)<=1)} cir = apply(m, 1, in.ucir )
#return the proportion of points in the unit circle * 4 return (4*mean(cir))
}
Advanced Research Compu2ng
MC π: Parallelize #start up and initialize the cluster cl <- makeCluster(ncores, type = 'MPI') clusterSetupRNG(cl, type = 'RNGstream') #determine if points are in the unit circle cir = parRapply(cl, m, in.ucir ) #calculate pi pi.approx = 4*mean(cir) #stop the cluster stopCluster(cl)
Advanced Research Compu2ng
Advanced Research Compu2ng
Advanced Research Compu2ng
MC π: An Op2miza2on Example > n.pts <- 500000
> m = matrix(runif(2*n.pts),n.pts,2) > in.ucir <- function(x) { as.integer((x[1]^2 + x[2]^2) <= 1) }
> system.time( apply(m, 1, in.ucir ) )
user system elapsed
5.037 0.025 5.069
> system.time( as.integer(m[,1]^2 + m[,2]^2 <= 1) )
user system elapsed
0.02 0.00 0.02
Advanced Research Compu2ng
MCMC: Metropolis-‐Has2ngs
• Goal: Draw random samples with probability density approxima2ng given distribu2on
• Used to model stochas2c inputs • Do not need to know normalizing factor
– Func2ons in high dimensions
Advanced Research Compu2ng
MCMC: Metropolis-‐Has2ngs • Given a:
– Target distribu2on – Jumping distribu2on – Ini2al sample
• Choose candidate sample from jumping distribu2on centered at ini2al sample
• Accept candidate as new sample: – Always if candidate is beTer fit (per target dist) – With probability <1 if candidate is worse fit
• Repeat with new sample as ini2al sample
Advanced Research Compu2ng
M-‐H: Code (Markov Chain Part) #function to calculate next sample theta.update <- function(theta.cur) { #candidate sample theta.can <- jump(theta.cur) #acceptance probability accept.prob <- samp(theta.can)/samp(theta.cur) #compare with sample from uniform dist (0 to 1) if (runif(1) <= accept.prob) theta.can else theta.cur }
Reference: Lam, Patrick. "MCMC Methods: Gibbs Sampling and the Metropolis-‐HasDngs Algorithm."
Advanced Research Compu2ng
Metropolis-‐Has2ngs: Code #function to generate (n.sims-burnin) samples mh <- function(n.sims, start, burnin, samp, jump) { theta.cur <- start draws <- c() #call theta.update() n.sims times for (i in 1:n.sims) { draws[i] <- theta.cur <- theta.update(theta.cur) } #return the samples after the burn in return( draws[(burnin + 1):n.sims] ) }
Reference: Lam, Patrick. "MCMC Methods: Gibbs Sampling and the Metropolis-‐HasDngs Algorithm."
Advanced Research Compu2ng
Advanced Research Compu2ng
Advanced Research Compu2ng
Metropolis-‐Has2ngs: Parallelize #start up and initialize the cluster cl <- makeCluster(ncores, type = 'MPI') clusterSetupRNG(cl, type = 'RNGstream') #samples per core core mh.n.sims.cl <- ceiling(mh.n.sims / ncores) #call mh on each core mh.draws.cl <- clusterCall(cl, mh, mh.n.sims.cl, start = 1, burnin = mh.burnin, samp = samp.fcn, jump = jump.fcn) #reduce list to 1-D mh.draws <- unlist(mh.draws.cl) #stop the cluster stopCluster(cl)
Advanced Research Compu2ng
Advanced Research Compu2ng
Advanced Research Compu2ng 29
CONCLUSIONS
Advanced Research Compu2ng
R on ARC’s Systems
• R 3.0.x, 2.14.1 • Ploeng via cairo • Each R build comes with Rmpi and Snow
– For OpenMPI only
• To use: module load intel <or> module load gcc module load openmpi module load R
Advanced Research Compu2ng
Geeng Started on ARC’s Systems
• Request an account (anyone with a VT PID): hTp://www.arc.vt.edu/forms/account_request.php – Can also request for external collaborators
• Request a system unit alloca2on: hTp://www.arc.vt.edu/userinfo/alloca2ons.php – MIC nodes are “charged” the same as normal
nodes
Advanced Research Compu2ng 32
References
• Snow Manual: hTp://cran.r-‐project.org/web/packages/snow/snow.pdf
• Snow Func2ons: hTp://www.sfu.ca/~sblay/R/snow.html
• ARC’s R page: hTp://www.arc.vt.edu/resources/sokware/r
• Course Slides: hTp://www.arc.vt.edu/userinfo/training/2014Spring_NLI.php#parallelr
Advanced Research Compu2ng
Ques2ons?