+ All Categories
Home > Documents > Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala...

Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala...

Date post: 12-Oct-2019
Category:
Upload: others
View: 12 times
Download: 0 times
Share this document with a friend
31
cso.io Scalable Performance for Scala Message-Passing Concurrency Andrew Bate Department of Computer Science University of Oxford
Transcript
Page 1: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

cso.io

Scalable Performance for Scala Message-Passing Concurrency

Andrew Bate

Department of Computer Science University of Oxford

Page 2: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Motivation

Multi-core commodity hardware

Non-uniform shared memory

Expose potential parallelism

Correctness and formal verification

Compatibility

int arr[x][y];

Page 3: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

EMBEDDED DOMAIN-SPECIFIC LANGUAGE

1

2

3

4

5

Embedded DSL

Bytecode rewriting

Channels

Scheduler

Deadlock detection

Page 4: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Why an Embedded DSL?

Ease of implementation

Leverage existing tools

Leverage known syntax Higher-order functions

Rich type system

Lightweight syntax

Compile-time macros

Page 5: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

def map[I, O](f: I => O)(in: ?[I], out: ![O]) = proc { repeat { out ! (f(in?)) } run (proc { in.closein } || proc { out.closeout }) }

in 𝑓𝑓(v) v

out map 𝑓𝑓

Examples

Page 6: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

def tee[@specialized T](in: ?[T], outs: Seq[![T]]) = proc { var v = null val outputs = (|| (out <- outs) proc { out ! v })) repeat { v = in?; run outputs } run (proc { in.closein } || (|| (out <- outs) proc { out.closeout })) }

in

v

v

v

v

out1

out2

outn

tee

Examples

Page 7: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

SCALABLE PERFORMANCE through bytecode rewriting

1

2

3

4

5

Embedded DSL

Bytecode rewriting

Channels

Scheduler

Deadlock detection

Page 8: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

CPS Transformation

Call n(f)

Return

Init

Pre-call

Post-call

Prelude

rewinding

pausing

Call n()

Return

Init

Page 9: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Analysing the call graph

do()

x()

y()

z()

?() do()

y()

Transform these methods

Page 10: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Engineering

Live variable analysis

Lazy load and store

Constant inlining

Page 11: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Functional Expressions for (i <- 0 until n; j <- i until n) println(i)

intWrapper(0).until(n).foreach( i: Int => intWrapper(i).until(n).foreach(j: Int => println(i)) )

Com

pile

s to

Tran

sfor

ms t

o

var i = 0 while (i < n) { var j = i while (j < n) { println(i); j += 1 } i += 1 }

Page 12: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Tail call optimisations

Shared memory

SBT plugin support

More Features

Page 13: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

CHANNELS

1

2

3

4

5

Embedded DSL

Bytecode rewriting

Channels

Scheduler

Deadlock detection

Page 14: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

More Features

Generalised alt

Specialization for primitives

Optimised extended rendezvous

Page 15: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

SCHEDULER

1

2

3

4

5

Embedded DSL

Bytecode rewriting

Channels

Scheduler

Deadlock detection

Page 16: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Scheduler States

Created

Waiting

Terminated

Paused

Running

Page 17: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Scheduling: Central FIFO

Thre

ad 1

Thre

ad 2

Thre

ad 𝑚𝑚

Scheduler

𝑃𝑃1 𝑃𝑃2 𝑃𝑃3 𝑃𝑃𝑛𝑛

Page 18: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Scheduling: FIFO per thread

Thre

ad 1

Thre

ad 2

Thre

ad 𝑚𝑚

Scheduler

𝑃𝑃1

Scheduler

𝑃𝑃3

Scheduler

𝑃𝑃𝑛𝑛

Page 19: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Scheduler

Scheduling: Batches per thread

Thre

ad 1

Thre

ad 2

Thre

ad 𝑚𝑚

Scheduler

Scheduler

⋯ ? ? ?

Page 20: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Scheduling: Batches per thread

Scheduler

𝑃𝑃1 𝑃𝑃2 𝑃𝑃𝑛𝑛 𝑄𝑄1 𝑄𝑄𝑚𝑚 𝑅𝑅1 𝑅𝑅𝑘𝑘

Dispatch Count = max const × Batch Length, Dispatch Limit

Page 21: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

DEADLOCK DETECTION

1

2

3

4

5

Embedded DSL

Bytecode rewriting

Channels

Scheduler

Deadlock detection

Page 22: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Example Tee

x2 x3

x5

Merge

Prefix 1

Console

Tee

Merge

Tee

Page 23: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Example Tee

x2 x3

x5

Merge

Prefix 1

Console

Tee

Merge

Tee

!

! !

! !

! ? ?

?

!

Deadlock detected! The cycle of ungranted requests is: Prefix1 -!-> Tee1 Tee3 -!-> x5 Tee1 -!-> Tee2 x5 -!-> Merge2 Tee2 -!-> Tee3 Merge2 -!-> Prefix1

Page 24: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

PERFORMANCE EVALUATION

1

2

3

4

5

Embedded DSL

Bytecode rewriting

Channels

Scheduler

Deadlock detection

Page 25: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Ring topology

100

1000

10000

100000

Tim

e to

pas

s a m

essa

ge 3

00 ti

mes

aro

und

an n

pro

cess

ring

(ms)

Number n of processes spawned

CSO2 FIFO Scheduler

Java primitives

CSO2 Batch Scheduler

Page 26: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Ring topology

0

50000

100000

150000

200000

250000

300000

Tim

e to

pas

s a m

essa

ge 3

00 ti

mes

aro

und

an n

pro

cess

ring

(ms)

Number n of processes spawned

CSO2 FIFO Scheduler

Java primitives

CSO2 Batch Scheduler

Page 27: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Fully connected topology

10

100

1000

10000

100000

1000000

Tim

e to

pas

s n2 m

essa

ges (

ms)

Number n of processes / actors spawned

ErlangScala ActorsJCSPJava PrimitivesOccamCSO2 FIFO SchedulerCSO2 Batch SchedulerGo

CSO2

CSO2

Page 28: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Fully connected topology

0

200000

400000

600000

800000

1000000

1200000

1400000

1600000

1800000

2000000

Tim

e to

pas

s n2 m

essa

ges (

ms)

Number n of processes / actors spawned

ErlangScala ActorsJCSPJava PrimitivesOccamCSO2 FIFO SchedulerCSO2 Batch SchedulerGo

CSO2 CSO2

Page 29: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Fully connected topology

0

10000

20000

30000

40000

50000

60000

Tim

e to

pas

s n2 m

essa

ges (

ms)

Number n of processes / actors spawned

JCSP

Occam

CSO2 FIFO Scheduler

CSO2 Batch Scheduler

Go

CSO2

CSO2

Page 30: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Fully connected topology

0

2000

4000

6000

8000

10000

12000

14000

16000

Tim

e to

pas

s n2 m

essa

ges (

ms)

Number n of processes / actors spawned

CSO2 Batch Scheduler

CSO2 FIFO Scheduler

Go

Page 31: Scalable Performance for Scala Message-Passing Concurrency · cso.io Scalable Performance for Scala Message-Passing Concurrency . Andrew Bate . Department of Computer Science University

Summary

• High performance library for building massively concurrent systems on the JVM

• Deadlock detection

• Outperforms Java primitives, JCSP, Scala Actors, Occam, and very close to Go


Recommended