1© 2015 The MathWorks, Inc.
Dr. Roland Michaely
MathWorks
June 9, 2015
Parallel Computing
Who, What, Why, When, Where?
2
Why use Parallel Computing?
Reduce computation time
Overcome memory limitations
Free up your desktop
3
Flight Test Data Analysis
16x Faster
Heart Transplant Studies
3-4 weeks reduced to 5 days
Mobile Communications Technology
Simulation time reduced from weeks to
hours, 5x more scenarios
Hedge Fund Portfolio Management
Simulation time reduced from 6 hours to 1.2 hours
Optimizing JIT Steel Manufacturing Schedule
Cut simulation time from 1 hour to 5 minutes
Where is Parallel Computing Applied?
4
Who should use Parallel Computing?
You!
…. if you are a MATLAB or Simulink user
…. if you have multiple cores in your computer
…. if you have a Graphics Processing Unit
…. if you can access a cluster or grid
5
“I spend a lot of time in the clinic, and don’t have the time or the
technical expertise to learn, configure, and maintain software.
MATLAB makes it easy for physicians like me to get work done
and produce meaningful results.”
Dr. Johan Nilsson
Skåne University Hospital
Lund University
Link to user story
6
Why use Parallel Computing?
Reduce computation time
Overcome memory limitations
Free up your desktop
7
Speed-up using Multiple Cores
MATLAB
Desktop (Client)
Worker
Worker
Worker
Worker
Worker
Worker
Built-in Capabilities
Optimization
Global Optimization
Statistics and
Machine Learning
Neural Network
Signal Processing
Image Processing
…
www.mathworks.com/builtin-parallel-support
8
Speed-up using Multiple CoresParallel for loops
Independent tasks or iterations
No dependencies or communications between tasks
Examples: parameter sweeps, Monte Carlo simulations
MATLAB
Desktop (Client)
TimeTime
Worker
Worker
Worker
9
Speed-up using Multiple CoresParameter Sweep Serial for loop
10
Speed-up using Multiple CoresParameter Sweep Parallel for loop
11
Speed-up using Multiple CoresStart Parallel Pool
12
Speed-up using Multiple CoresParallel for loop Benchmark
0
120
240
360
480
600
0 8 16 24 32
workers
Computation time (s)
cluster
Intel(R) Xeon(R)
E5-2660 2.20GHz
8 cores per node
17.1x
faster
local
Intel(R) Core(TM)
i7-3520M 2.90GHz
2 cores
0
120
240
360
480
600
0 8 16 24 32
workers
Computation time (s)
1.5x
faster
13
What is a Graphics Processing Unit (GPU)?
Originally for graphics acceleration, now also used
for scientific calculations
Massively parallel array of integer and
floating point processors
Dedicated high-speed memory
Parallel Computing Toolbox requires NVIDIA GPUs with Compute Capability 2.0 or
higher, including NVIDIA Tesla 20-series products. See a complete listing at
www.nvidia.com/object/cuda_gpus.html
14
Criteria for Good Problems to Run on a GPU
Massively parallel:
– Calculations can be broken into 1000s of independent units of work
– Problem size takes advantage of many GPU cores
Computationally intensive:
– Computation time significantly exceeds CPU/GPU data transfer time
Algorithm consists of supported functions:
– 200+ MATLAB functions supported on the GPU
– Additional support in
Image Processing, Communications and Signal Processing
GPU
Device MemoryDevice Memory
15
Run Same Code on GPU than or CPUSolving 2D Wave Equation
0
10
20
30
40
50
60
70
80
0 512 1024 1536 2048
Grid size
Computation time (s)
18 x
faster23x
faster
20x
faster
GPU
NVIDIA Tesla K20c
706MHz
2496 cores
memory bandwith 208 Gb/s
CPU
Intel(R) Xeon(R)
W3550 3.06GHz
4 cores
memory bandwidth 25.6 Gb/s
16
Run Same Code on GPU than or CPUPass Data to GPU
17
“Our legacy code took up to 40 minutes to analyze a single wind
tunnel test; by using MATLAB and a GPU, computation time is
now under a minute. It took 30 minutes to get our MATLAB
algorithm working on the GPU—no low-level CUDA programming
was needed.”
Christopher Bahr
NASA
Link to user story
18
Why use Parallel Computing?
Reduce computation time
Overcome memory limitations
Free up your desktop
19
Overcoming Memory LimitationsDistribute Memory
A = rand(5e5,2);
A = distributed.rand(5e5,2);
MATLAB
Desktop (Client)
Worker
Worker
Worker
Worker
1 …
… …
… …
… …
… 1e6
… …
… …
… …
… …
1 …
… …
20
Overcoming Memory LimitationsUsing MATLAB mapreduce
Develop your mapreduce application once, run in multiple environments
Local hardware
MATLAB
Cluster
mapreducer(0)
mapreducer(parpool('local'))
mapreducer(parpool('cluster'))
Integrated with Hadoop
mapreducer(parallel.cluster.Hadoop)
21
Why use Parallel Computing?
Reduce computation time
Overcome memory limitations
Free up your desktop
22
MATLAB
Desktop (Client)
Offload Computations with batch
Result
Work
Worker
Worker
Worker
Worker
batch(…)
23
Offload and Scale Computations with batch
MATLAB
Desktop (Client)
Result
Work
Worker
Worker
Worker
Worker
batch(…, 'Pool',…)
24
Offload and Scale Computations with batch
25
Local Hardware
Scale Compute Power
MATLAB
Cluster
26
Where can you use it?
On your laptop
On your desktop
On a cluster
27
Why use Parallel Computing?
Reduce computation time by
– Using more cores
– Accessing Graphics Processing Units
Overcome memory limitations by
– Distributing data to available hardware
– Using MATLAB mapreduce
Free up your desktop by
– Offloading computations to a cluster