+ All Categories
Home > Documents > Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham,...

Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham,...

Date post: 21-Dec-2015
Category:
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
68
Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs, Ross McIlroy, Simon Peter, Vijayan Prabhakaran, Timothy Roscoe, Adrian Schüpbach, Akhilesh Singhania
Transcript
Page 1: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Cores, cores, everywhere

Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson,

Rebecca Isaacs, Ross McIlroy, Simon Peter, Vijayan Prabhakaran, Timothy Roscoe, Adrian Schüpbach, Akhilesh Singhania

Page 2: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Two hardware trendsBarrelfish operating systemMessage-passing softwareManaging parallel work

Page 3: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Amdahl’s law

“Sorting takes 70% of the execution time of a sequential program. You replace the sorting algorithm with one that scales perfectly on multi-core hardware. On a machine with 128 cores, how many cores do you need to use to get a 4x speed-up on the overall program?”

Page 4: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Amdahl’s law, f=70%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

#cores

Spee

dup

Desired 4x speedup

Speedup achieved (perfect scaling on 70%)

Limit as c→∞ = 1/(1-f) = 3.33

Page 5: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Amdahl’s law, f=10%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160.94

0.96

0.98

1.00

1.02

1.04

1.06

1.08

1.10

1.12

#cores

Spee

dup

Speedup achieved with perfect scaling

Amdahl’s law limit, just 1.11x

Page 6: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Amdahl’s law, f=98%

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 1270

10

20

30

40

50

60

#cores

Spee

dup

Page 7: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Amdahl’s law & multi-core

Suppose that the same h/w budget (space or power) can make us:

1 2

5 6

3 4

7 8

9 10

13 14

11 12

15 16

1

1 2

3 4

(analysis from Hill & Marty “Amdahl’s law in the multicore era”)

Page 8: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Perf of big & small cores

1/16 1/8 1/4 1/2 10.0

0.2

0.4

0.6

0.8

1.0

1.2

Resources dedicated to core

Core

per

f (re

lativ

e to

1 b

ig c

ore

Assumption: perf = α √resource

Total perf:16 * 1/4 = 4

Total perf:1 * 1 = 1

(analysis from Hill & Marty “Amdahl’s law in the multicore era”)

Page 9: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Amdahl’s law, f=98%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

#Cores

Perf

(rel

ative

to 1

big

cor

e)

1 big

4 medium

16 small

(analysis from Hill & Marty “Amdahl’s law in the multicore era”)

Page 10: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Amdahl’s law, f=75%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160.0

0.2

0.4

0.6

0.8

1.0

1.2

#Cores

Perf

(rel

ative

to 1

big

cor

e)

1 big

4 medium

16 small

(analysis from Hill & Marty “Amdahl’s law in the multicore era”)

Page 11: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Asymmetric chips

1

3 4

7 8

9 10

13 14

11 12

15 16

Page 12: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Amdahl’s law, f=75%

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

#Cores

Perf

(rel

ative

to 1

big

cor

e)

1 big4 medium

16 small

1+12

(analysis from Hill & Marty “Amdahl’s law in the multicore era”)

Page 13: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Two hardware trends

Traditional multi-processor machines

Asymmetric performance and/or

instruction sets

Page 14: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Cache-coherent multicore

AMD Istanbul: 6 cores, per-core L2, per-package L3

CoreL2

CoreL2

CoreL2

CoreL2

CoreL2

CoreL2

L3

CoreL2

CoreL2

CoreL2

CoreL2

CoreL2

CoreL2

L3

CoreL2

CoreL2

CoreL2

CoreL2

CoreL2

CoreL2

L3

CoreL2

CoreL2

CoreL2

CoreL2

CoreL2

CoreL2

L3

RAM

RAM

RAM

RAM

RAM

RAM

RAM

RAM

Page 15: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Single-chip cloud computer (SCC)

24 * 2-core tilesOn-chip mesh n/w

Non-coherent cachesHardware supported messaging

L2 Core

L2

Router MPB

Core

VRC

MC

-1

MC

-3

MC

-0

MC

-4System

interface

RAM RAM

RAMRAM

Page 16: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

MSR Beehive

Ring interconnectMessage passing in h/w

No cache coherenceSplit-phase memory access

Module MemMux

MQ

DDR Controller

Core 2

RingIn[31:0],SlotTypeIn[3:0],SrcDestIn[3:0]

Core 3Core N

Module RISCNModule RISCN Module RISCN

Messages, Locks

RA from display controller

RA,WA

WDRD (128 bits) Rdreturn (32 bits)

(pipelined bus toall cores)

RD toDisplay

controller

Core 1

Module RISCN

RAM

Page 17: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Two hardware trends

Traditional multi-processor machines

Asymmetric performance and/or

instruction sets

Non-cache-coherent access to memory

Page 18: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Two hardware trendsBarrelfish operating systemMessage-passing softwareManaging parallel work

Page 19: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Messaging vs shared data as default

• Fundamental model is message based• “It’s better to have shared memory and

not need it than to need shared memory and not have it”

Shared state,one-big-lock

Fine-grainedlocking

Clustered objects,partitioning

Distributed state,replica maintenance

Traditional operating systemsBarrelfish multikernel

Page 20: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

The Barrelfish multi-kernel OS

x64

Message passing

App

x64 ARMAccelerator core

App

OS node OS node OS node OS node

Statereplica

Statereplica

State replica

Statereplica

App App

Hardware interconnect

Page 21: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

The Barrelfish multi-kernel OS

x64

Message passing

App

x64 ARMAccelerator core

App

OS node OS node OS node OS node

Statereplica

Statereplica

State replica

Statereplica

App App

Hardware interconnect

System runs on heterogeneous hardware, currently supporting ARM, Beehive, SCC, x86 & x64

Page 22: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

The Barrelfish multi-kernel OS

x64

Message passing

App

x64 ARMAccelerator core

App

OS node OS node OS node OS node

Statereplica

Statereplica

State replica

Statereplica

App App

Hardware interconnect

System components, each local to a specific core, and using

message passing

System runs on heterogeneous hardware, currently supporting ARM, Beehive, SCC, x86 & x64

Page 23: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

The Barrelfish multi-kernel OS

x64

Message passing

App

x64 ARMAccelerator core

App

OS node OS node OS node OS node

Statereplica

Statereplica

State replica

Statereplica

App App

Hardware interconnect

User-mode programs: several models supported, including conventional shared-memory

OpenMP & pthreads

System components, each local to a specific core, and using

message passing

System runs on heterogeneous hardware, currently supporting ARM, Beehive, SCC, x86 & x64

Page 24: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Two hardware trendsBarrelfish operating systemMessage-passing softwareManaging parallel work

Page 25: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Shared Resource Database Consensus

bool updatePermissions(page_t page, flags_t flags) { bool ok = true; for (core in cores) ok &= permUpdateRequest_rpc(core, page, flags); if (ok) { localUpdatePermissions(page, flags); for (core in cores) permUpdateCommit_send(core, page, flags); } else { for (core in cores) permUpdateAbort_send(core, page, flags); } return ok;}

Two-Phase Commit

Voting Phase

Commit Phase

Blocking RPC before sending to next core

~400 cyclesassuming process is scheduled on other

core!

Page 26: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Shared Resource Database Consensusbool updatePermissions(page_t page, flags_t flags) { state_t *st = malloc (sizeof(state_t)); st->ok=true; st->page=page; st->flags=flags; st->count=0; for (core in cores) { permUpdateRequest_send(core, page, flags, st); st.count++;}}void recvReply(state_t st, bool ok) { st->ok &= ok; if (st->count-- == 0) { if (st->ok) { localUpdatePermissions(st->page, st->flags); for (core in cores) permUpdateCommit_send(core, st->page, st->flags); } else { for (core in cores) permUpdateAbort_send(core, st->page , st->flags); free(st);}}

Stack-RippedCan fail to send immediately (e.g., due to full channel)

Need to Stack-Rip

and here

and here…

Page 27: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

AC: Asynchronous C

Synchronous Event-Driven

Easy to program

Poor Performance

Difficult to program

Good Performance

AC:Similar programing model to

sync Similar performance to event-

driven

Page 28: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Shared Resource Database Consensusbool updatePermissions(page_t page, flags_t flags) { bool ok = true; do { for (core in cores) async { ok &= permUpdateRequest_AC(core, page, flags); } } finish; if (ok) { localUpdatePermissions(page, flags); for (core in cores) permUpdateCommit_send(core, page, flags); } else { for (core in cores) permUpdateAbort_send(core, page , flags); } return ok;}

Identify code that can block – execution can continue after async

AC versions of message RPCs

Don’t pass finish until all async work created in do {} finish

block has complete

Page 29: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

10000

20000

30000

40000

50000

# Cores

Tim

e pe

r op

erat

ion

/ cy

cles

Shared Resource Database Consensus

Event-DrivenSynchronous

AC

Page 30: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Performance

Ping-pong test Minimum-sized messages

AMD 4 * 4-core machineUsing cores sharing L3

cache

Ping-pong latency (cycles)

Using UMP channel directly

931

Using event-based stubs

1134

Synchronous model (client only)

1266

Synchronous model(client and server)

1405

MPI (Visual Studio 2008 + HPC-Pack 2008 SDK)

2780

Page 31: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

PerformanceFunction call latency

(cycles)

Direct (normal function call)

8

async foo()(foo does not block)

12

async foo()(foo blocks)

1692

• “Do not fear async”– Think about correctness: if the callee doesn’t

block then perf is basically unchanged

Page 32: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Two hardware trendsBarrelfish operating systemMessage-passing softwareManaging parallel work

Page 33: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Adding Parallelism

do { async msg_send(core_1, “Computing Forces”); par fluidAnimate (computeForces, cells, range); } finish; Spawn a bunch of parallel

tasks that can be run across multiple cores

Wait for parallel and async tasks to complete before

continuing

Page 34: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

FluidAnimate

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Page 35: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Static Partitioning

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Page 36: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Static Partitioning

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Page 37: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Static Partitioning

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Problem: Uneven workload

Page 38: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Static Partitioning

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Problem: Barrier Synchronization

Page 39: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Static Partitioning

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Problem: Thread Preemption

Approach taken by (e.g.) OpenMP and Intel Parallel Building Blocks

They assume you own the machine and know your workload

Page 40: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Dynamic Partitioning (Work-Stealing)

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Page 41: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Dynamic Partitioning (Work-Stealing)

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Page 42: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Dynamic Partitioning (Work-Stealing)

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Page 43: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Dynamic Partitioning (Work-Stealing)

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Page 44: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Dynamic Partitioning (Work-Stealing)

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Page 45: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Dynamic Partitioning (Work-Stealing)

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Page 46: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Dynamic Partitioning (Work-Stealing)

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Problem: Spawn / Sync Overhead

Cilk-5: 218 cycles per task

Wool (old version): 97 cycles per task

Density calculation task:~ 10 cycles per particle

Page 47: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Dynamic Partitioning (Work-Stealing)

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Problem: Cache Locality

Page 48: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Dynamic Partitioning (Work-Stealing)

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Problem: Cache Locality

Page 49: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Dynamic Partitioning (Work-Stealing)

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Problem: Cache Locality

Page 50: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Dynamic Partitioning (Work-Stealing)

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Problem: Cache Locality

Page 51: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Dynamic Partitioning (Work-Stealing)

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Problem: Cache Locality

Page 52: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Dynamic Partitioning (Work-Stealing)

• for each frame

–move particles to correct cell

–calculate cell density

–calculate particle forces

–calculate particles position

–render frame

Problem: Data Synchronization

Page 53: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Space-Time Continuum

• Controlled partitioning programming model– Flexible enough to enable movement on this

spectrum– Runtime system controls re-partitioning– Application controls how

• Parameterise how data is partitioned• Decide whether data-synchronisation is

necessary

DynamicPartitioning

Static Partitioning

Workload 1Workload 2

Workload 1Workload 264 Core Server

4 Core Laptop

Page 54: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Controlled Partitioning

Page 55: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Controlled Partitioning

Page 56: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Controlled Partitioning

Page 57: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Controlled Partitioning

Page 58: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Controlled Partitioning

Page 59: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Controlled Partitioning

Page 60: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

FluidAnimatevoid computeForces(cell_t [][][] cells, dimentions_t d) { range_t range= { .x_start=0, .x_curr=0, .x_end=d.x_len, ...}; do { par fluidAnimate (computeForces, cells, range); } finish;}par_task fluidAnimate { task computeForces(cell_t cell) { for (particle in cell) { struct cell_t [] ncells = getNeighbours(cell); particle.force = calcForce(particle, ncells); }} range_t [] subdivide(range_t curr_cells, int num) { // subdivide curr into num equal cubes, and add to new } cell_t getNext(cells_t [][][] cells, range_t range) { // return next cell in cells, or NULL if finished}}

Page 61: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

FluidAnimatevoid __computeForces_task(range_t my_range, cells_t [][][] cells) { cell_t cell = __fluidAnimate_getNext(cells, my_range); do { for (particle in cell) { struct cell_t [] ncells = getNeighbours(cell); particle.force = calcForce(particle, ncells); } if ((int num = calico_should_subdivide()) > 0) { range_t[] new_ranges = __fluidAnimate_subdivide(my_range, num); calico_schedule_par(__computeForces_task, new_ranges, cells); return; } while ((cell = __fluidAnimate_getNext(cells, my_range)) != NULL);} range_t[] __fluidAnimate_subdivide(range_t curr_cells, int num) { // subdivide curr into num equal cubes, and add to new}cell_t __fluidAnimate_getNext(cells_t [][][] cells, range_t range) { // return next cell in cells, or NULL if finished}

Aggregation of multiple task iterations

Automatic Repartitioning when necessary

Page 62: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

FluidAnimatepar_task fluidAnimate {

task moveParticles(cell_t cell) { ... } task computeDensities(cell_t cell) { ... } task computeForces(cell_t cell) { ... } task renderCell(cell_t cell) { ... }

range_t [] subdivide(range_t curr_cells, int num) { // subdivide curr into num equal cubes, and add to new } cell_t getNext(cells_t [][][] cells, range_t range) { // return next cell in cells, or NULL if finished } bool calcOnDifferentCore(cell_t cell, range_t range) { // return true if cell is not within range}}

Page 63: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

FluidAnimatepar_task fluidAnimate { task moveParticles(cell_t cell) { for (particle in cell) { cell_t new_cell = calculateParticlesCell(particle); if (new_cell == cell) continue; if (onDifferentCore(new_cell)) { lockAndUpdate(new_cell, particle); } else { updateNoLock(new_cell, particle); }}} ... bool calcOnDifferentCore(cell_t cell, range_t range) { // return true if cell is not within range } ... }}

Page 64: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

FluidAnimatepar_task fluidAnimate { task moveParticles(cell_t cell) { for (particle in cell) { cell_t new_cell = calculateParticlesCell(particle); if (new_cell == cell) continue; if (onDifferentCore(new_cell)) { lockAndUpdate(new_cell, particle); } else { updateNoLock(new_cell, particle); }}} ... bool calcOnDifferentCore(cell_t cell, range_t range) { // return true if cell is not within range } ... }}

calcOnDifferentCore(new_cell, my_range);

Page 65: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

FluidAnimate Results

0 1 2 3 4 5 6 7 80

0.2

0.4

0.6

0.8

1

1.2

1.4

Parsec NativeCalico

Number of Cores

Wal

l-clo

ck e

xecu

tion

time

(nor

mal

ised

to s

eque

ntial

)

No competition for CPU-time

Bett

er

Page 66: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

FluidAnimate Results

0 1 2 3 4 5 6 7 80

0.5

1

1.5

2

2.5

Parsec NativeCalico

Number of Cores

Wal

l-clo

ck e

xecu

tion

time

(nor

mal

ised

to s

eque

ntial

)

Competition for CPU-time

Bett

er

Page 67: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

Two hardware trendsBarrelfish operating systemMessage-passing softwareManaging parallel work

http://www.barrelfish.org

Page 68: Cores, cores, everywhere Based on joint work with Martín Abadi, Andrew Baumann, Paul Barham, Richard Black, Vladimir Gajinov, Orion Hodson, Rebecca Isaacs,

©2010 Microsoft Corporation. All rights reserved.This material is provided for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Microsoft is a registered trademark or trademark of Microsoft Corporation in the United States and/or other countries.


Recommended