+ All Categories
Home > Documents > e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science...

e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science...

Date post: 14-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
57
Tools for Data Science Introduction to Parallel Computing
Transcript
Page 1: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Tool

s fo

r D

ata

Sci

ence

Introduction to Parallel Computing

Page 2: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 2

Par

alle

l Com

putin

g –

Intr

oOverview

❏ Parallel Computing

❏ The name of the game

❏ Programming models

❏ Parallel architectures

❏ Multi-core everywhere

❏ Hardware examples

Page 3: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 3

Par

alle

l Com

putin

g –

Intr

oParallelism is everywhere

❏ In today's computer installations one has many levels of parallelism:

❏ Instruction level (ILP)

❏ Chip level (multi-core, multi-threading)

❏ System level (SMP)

❏ GP-GPUs

❏ Grid/Cluster

Page 4: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 4

Par

alle

l Com

putin

g –

Intr

oWhat is Parallelization?

The Name of the Game

Page 5: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 5

Par

alle

l Com

putin

g –

Intr

oWhat is Parallelization?

An attempt of a definition:

“Something” is parallel, if there is a certain level ofindependence in the order of operations

“Something” can be:

► A collection of program statements

► An algorithm

► A part of your program

► The problem you are trying to solve

gran

ularity

Page 6: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 6

Par

alle

l Com

putin

g –

Intr

oParallelism – when?

Something that does not follow this rule is not parallel !!!

Page 7: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 7

Par

alle

l Com

putin

g –

Intr

oParallelism – Example 1

Page 8: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 8

Par

alle

l Com

putin

g –

Intr

oParallelism – Example 2

Page 9: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 9

Par

alle

l Com

putin

g –

Intr

oParallelism – Results example 2

❏ parallel version of example 2 was run 4 times each on 1, 8, 32 and 64 threads/processors

❏ Output: sum over all elements of vector a

❏ Except for P=1, the results are:

❏ Wrong

❏ Inconsistent

❏ NOT reproducable

❏ This is called a 'Data Race'

Page 10: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 10

Par

alle

l Com

putin

g –

Intr

oParallelism

for (i = 0; i < n; i++ ) a[i]= a[i+M] + b[i];

M = 0 : parallel M >= 1 : not parallel

Fundamental problem:

Page 11: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 11

Par

alle

l Com

putin

g –

Intr

oWhat is a Thread?❏ Loosely said, a thread consists of a series of

instructions with it's own program counter (“PC”) and state

❏ A parallel program will execute threads in parallel

❏ These threads are then scheduled onto processors

Page 12: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 12

Par

alle

l Com

putin

g –

Intr

oSingle- vs. multi-threaded

code data I/O handles

registers stack

single-thread

code data I/O handles

registers

stack

thread-privatedata

registers

stack

thread-privatedata

registers

stack

thread-privatedata

registers

stack

thread-privatedata

. . . . .

multi-thread

Page 13: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 14

Par

alle

l Com

putin

g –

Intr

oParallelism vs Concurrency

Concurrent, and parallel execution

Concurrent, non-parallel execution:

e.g. multiple threads on a single core CPU

Page 14: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 15

Par

alle

l Com

putin

g –

Intr

oParallelism vs Concurrency

programs

concurrent

parallel

Page 15: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 19

Par

alle

l Com

putin

g –

Intr

oNumerical Results

Page 16: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 20

Par

alle

l Com

putin

g –

Intr

oBasic concepts

❏ Consider the following code with two loops

❏ Running this in parallel over i might give the wrong answer.

for (i = 0; i < n; i++) a[i] = b[i] + c[i];

for (i = 0; i < n; i++) d[i] = a[i] + e[i];

Page 17: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 21

Par

alle

l Com

putin

g –

Intr

oBasic concepts – the barrier

❏ The problem can be fixed:

❏ The barrier assures that no thread starts working on the second loop before the work on loop one is finished.

for (i = 0; i < n; i++) a[i] = b[i] + c[i];

for (i = 0; i < n; i++) d[i] = a[i] + e[i];

wait!

Page 18: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 22

Par

alle

l Com

putin

g –

Intr

oBasic concepts – the barrier

When to use barriers?

❏ To assure data integrity, e.g.

❏ after one iteration in a solver

❏ between parts of the code that read and write the same variables

❏ Barriers are expensive and don't scale to a large number of threads

Page 19: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 23

Par

alle

l Com

putin

g –

Intr

o ❏ A typical code fragment:

❏ This loop can not run in parallel, unless the update of sum is protected.

❏ An operation like the above is called a “reduction” operation, and there are ways to handle this issue (more later...).

Basic concepts – reduction

for( i = 0; i < n; i++ ) { ... sum += a[i]; ...}

☞ serial code

Page 20: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 24

Par

alle

l Com

putin

g –

Intr

oParallel Overhead

❏ The total CPU time may exceed the serial CPU time:

✔ The newly introduced parallel portions in your program need to be executed

✔ Threads need time for sending data to each other and for synchronizing (“communication”)

❏ Typically, things also get worse when increasing the number of threads

❏ Efficient parallelization is about minimizing the communication overhead

Page 21: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 25

Par

alle

l Com

putin

g –

Intr

oCommunication

Page 22: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 26

Par

alle

l Com

putin

g –

Intr

oLoad Balancing

Page 23: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 27

Par

alle

l Com

putin

g –

Intr

oDilemma – Where to parallelize?

Page 24: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 29

Par

alle

l Com

putin

g –

Intr

oScalability – speed-up & efficiency

Page 25: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 30

Par

alle

l Com

putin

g –

Intr

oAmdahl's Law

Page 26: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 31

Par

alle

l Com

putin

g –

Intr

oAmdahl's Law

Page 27: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 32

Par

alle

l Com

putin

g –

Intr

oAmdahl's Law in Practice

Page 28: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 38

Par

alle

l Com

putin

g –

Intr

oCode scalibility in practice – II

❏ Ideally, HPC codes would be able to scale to the theoretical limit, but ...

❏ Never the case in reality

❏ All codes eventually reach a real upper limit on speedup

❏ At some point codes become “bound” to one or more limiting hardware factors (memory, network, I/O)

Page 29: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 39

Par

alle

l Com

putin

g –

Intr

oWhat is Parallelization? - Summary❏ Parallelization is simply another optimization

technique to get your results sooner

❏ To this end, more that one processor is used to solve the problem

❏ The “Elapsed Time” (also called wallclock time) will come down, but total CPU time will probably go up

❏ The latter is a difference with serial optimization, where one makes better use of existing resources, i.e. the cost will come down

Page 30: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 40

Par

alle

l Com

putin

g –

Intr

o

Parallel Programming Models

Page 31: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 41

Par

alle

l Com

putin

g –

Intr

oParallel Programming ModelsTwo “classic” parallel programming models:

❏ Distributed memory

❏ PVM (standardized)

❏ MPI (de-facto standard, widely used)

❏ http://mpi-forum.org or http://open-mpi.org/

❏ Shared memory

❏ Pthreads (standardized)

❏ OpenMP (de-facto standard)

❏ http://openmp.org/❏ Automatic parallelization (depends on compiler)

Clusters,

SMPs

SMPonly

Page 32: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 42

Par

alle

l Com

putin

g –

Intr

oParallel Programming Models

“Upcoming” programming models

❏ PGAS (Partitioned Global Address Space):

❏ UPC (Unified Parallel C)

❏ Co-Array Fortran

❏ GPUs: massively parallel & shared memory

❏ CUDA

❏ OpenCL

Page 33: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 43

Par

alle

l Com

putin

g –

Intr

oParallel Programming Models

Distributed memory programming model, e.g. MPI:

❏ all data is private to the threads

❏ data is shared by exchanging buffers

❏ explicit synchronization

Page 34: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 44

Par

alle

l Com

putin

g –

Intr

oParallel Programming ModelsMPI:

❏ An MPI application is a set of independent processes (threads)

❏ on different machines

❏ on the same machine

❏ communication over the interconnect

❏ network (network of workstations, cluster, grid)

❏ memory (SMP)

❏ communication is under control of the programmer

Page 35: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 45

Par

alle

l Com

putin

g –

Intr

oParallel Programming Models

Shared memory model, e.g. OpenMP:

❏ all threads have access to the same global memory

❏ data transfer is transparent to the programmer

❏ synchronization is (mostly) implicit

❏ there is private data as well

Page 36: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 46

Par

alle

l Com

putin

g –

Intr

oParallel Programming Models

OpenMP:

❏ needs an SMP

❏ but ... with newer CPU designs, there is an SMP in (almost) every computer

❏ multi-core CPUs (CMP)

❏ chip multi-threading (CMT)

❏ or a combination of both, e.g the Sun UltraSPARC-T series

❏ or ... (whatever we'll see in the future)

Page 37: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 47

Par

alle

l Com

putin

g –

Intr

oMPI vs OpenMP

OpenMP version of “Hello world”:

#include <stdio.h>

int main(int argc, char *argv[]) { printf("Hello world!\n"); return(0);}

#pragma omp parallel { } /* end parallel */

Page 38: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 48

Par

alle

l Com

putin

g –

Intr

oMPI vs OpenMP

no. of threads: OMP_NUM_THREADS

% cc -o hello -xopenmp hello.c% ./helloHello world!

% OMP_NUM_THREADS=2 ./helloHello world!Hello world!

% OMP_NUM_THREADS=8 ./helloHello world!Hello world!Hello world!Hello world!

Page 39: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 49

Par

alle

l Com

putin

g –

Intr

oMPI vs OpenMP

#include <stdio.h>#include <stdlib.h>#include "mpi.h"

int main(int argc, char **argv) { int myrank, p; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Comm_size(MPI_COMM_WORLD, &p); printf("Hello world from %d!\n", myrank); MPI_Finalize(); return 0;}

MPI version of “Hello world”:

Page 40: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 50

Par

alle

l Com

putin

g –

Intr

oMPI vs OpenMPMPI version: compile and run

$ cc -I/.../MPI/include -R/.../MPI/lib \ -L/.../MPI/lib -o hello_mpi hello_mpi.c -lmpi

$ ./hello_mpiERROR in MPI_Init: unclassified error:RTE_Init_lib: Cannot initialize: the program mustbe started by mprun.: Invalid requestFatal error, aborting.

$ mprun -np 4 ./helloHello world from 1!Hello world from 3!Hello world from 0!Hello world from 2!

Page 41: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 78

Par

alle

l Com

putin

g –

Intr

o

Multi-core – everywhere!

Welcome to a “threaded” world

Page 42: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 79

Par

alle

l Com

putin

g –

Intr

oToday’s Multicores99% of Top500 Systems Are Based on Multicore

Sun Niagara2 (8 cores)

Intel Polaris [experimental] (80 cores)

AMD Istanbul (6 cores)

IBM Cell (9 cores)

Intel Nehalem (4 cores)

282 use Quad-Core204 use Dual-Core 3 use Nona-core

Fujitsu Venus (8 cores)

IBM Power 7 (8 cores)

Page 43: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 80

Par

alle

l Com

putin

g –

Intr

oWhat is a multi-core chip?

❏ A “core” is not well-defined – let us assume it covers the processing units and the L1 caches (a very simplified CPU).

❏ Different implementations are possible – and available (examples follow), e.g. multi-threaded cores

❏ Cache hierarchy of private and shared caches

❏ For software developers it matters that there is parallellism in the hardware, they can take advantage of

Page 44: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 81

Par

alle

l Com

putin

g –

Intr

oA generic multi-core design

Page 45: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 84

Par

alle

l Com

putin

g –

Intr

oAMD Opteron – quad-core

❏ dedicated L2 caches

❏ shared L3 cache

Page 46: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 85

Par

alle

l Com

putin

g –

Intr

oAMD Opteron – quad-core

Page 47: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 92

Par

alle

l Com

putin

g –

Intr

oUltraSPARC-T2

System on a chip:

❏ 8 cores with 8 threads = 64 threads

❏ integrated multi-threaded 10 Gb/s Ethernet

❏ integrated crypto-unit per core

❏ low power (< 95W)

❏ < 1.5W/thread

Page 48: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 93

Par

alle

l Com

putin

g –

Intr

oUltraSPARC-T2

Page 49: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 95

Par

alle

l Com

putin

g –

Intr

oWhy adding threads to a core?

Execution of two threads:

Interleaving the work – better utilization:

Keyword: “Throughput Computing”

Page 50: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 119

Par

alle

l Com

putin

g –

Intr

oThe future

Prediction is very difficult,

especially if it is about the future.”

-- Niels Bohr (1885-1962)

Page 51: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 121

Par

alle

l Com

putin

g –

Intr

oThe future ... of multi-core

Many Floating-Point Cores

All Small Core

Many Small CoresAll Large Core

Mixed LargeandSmall Core

Different Classes of Chips:● Home● Games / Graphics● Business ● Scientific

Page 52: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 122

Par

alle

l Com

putin

g –

Intr

oMany-core challenges I

Page 53: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 123

Par

alle

l Com

putin

g –

Intr

oMany-core challenges II

Page 54: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 124

Par

alle

l Com

putin

g –

Intr

oMany-core challenges III

Page 55: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 125

Par

alle

l Com

putin

g –

Intr

oA 2014 multi-core chip: Intel Haswell

Page 56: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 128

Par

alle

l Com

putin

g –

Intr

oSummary

❏ You have heard about:

❏ Parallel programming models and basic concepts

❏ Parallel architectures (shared / distributed memory)

❏ Multi-core architectures

❏ GPUs/accelerators (more next week)

Page 57: e c n c Introduction to S a Parallel Computing t a D o f s ... · Fall 2014 Tools for Data Science 24 P a r a l l e l C o m p u t i n g – I n t r o Parallel Overhead The total CPU

Fall 2014 Tools for Data Science 129

Par

alle

l Com

putin

g –

Intr

oThe future ... of software

❏ Challenges:

❏ how to handle millions of cores/threads ...

... reliably

❏ re-design of algorithms: from coarse-grained parallelism to fine-grained parallelism

❏ more development tools needed to achieve this

❏ we need standards, to assure portability!

❏ long-term perspectives


Recommended