+ All Categories
Home > Documents > Safe and Efficient Cluster Communication in Java using Explicit Memory Management

Safe and Efficient Cluster Communication in Java using Explicit Memory Management

Date post: 08-Jan-2016
Category:
Upload: altessa
View: 30 times
Download: 1 times
Share this document with a friend
Description:
Safe and Efficient Cluster Communication in Java using Explicit Memory Management. Chi-Chao Chang Dept. of Computer Science Cornell University. Goal. High-performance cluster computing with safe languages parallel and distributed applications Use off-the-shelf technologies Java - PowerPoint PPT Presentation
46
Safe and Efficient Safe and Efficient Cluster Communication in Cluster Communication in Java using Explicit Java using Explicit Memory Management Memory Management Chi-Chao Chang Dept. of Computer Science Cornell University
Transcript
Page 1: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

Safe and Efficient Cluster Safe and Efficient Cluster Communication in Java using Communication in Java using Explicit Memory ManagementExplicit Memory Management

Chi-Chao ChangDept. of Computer Science

Cornell University

Page 2: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

GoalGoal

2

High-performance cluster computing with safe languages parallel and distributed applications

Use off-the-shelf technologies Java

safe: “better C++” “write once run everywhere” growing interest for high-performance applications (Java Grande)

User-level network interfaces (UNIs) direct, protected access to network devices prototypes: U-Net (Cornell), Shrimp (Princeton), FM (UIUC) industry standard: Virtual Interface Architecture (VIA) cost-effective clusters: new 256-processor cluster @ Cornell TC

Page 3: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

Java NetworkingJava Networking

3

Traditional “front-end” approach pick favorite abstraction (sockets, RMI,

MPI) and Java VM write a Java front-end to custom or

existing native libraries good performance, re-use proven code magic in native code, no common solution

Interface Java with Network Devices bottom-up approach minimizes amount of unverified code focus on fundamental data transfer

inefficiencies due to:

1. Storage safety

2. Type safety

RMI, RPC

Sockets

Active Messages, MPI, FM

UNI

Networking Devices

Apps

Java

C

Page 4: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

OutlineOutlineThesis Overview

GC/Native heap separation, object serialization

Experimental Setup: VI Architecture and Marmot

Part I: Array Transfers(1) Javia-I: Java Interface to VI Architecture

respects heap separation

(2) Jbufs: Safe and Explicit Management of Buffers Javia-II, matrix multiplication, Active Messages

Part II: Object Transfers(3) A Case For Specialization

micro-benchmarks, RMI using Javia-I/II, impact on application suite

(4) Jstreams: in-place de-serialization micro-benchmarks, RMI using Javia-III, impact on application suite

Conclusions

4

Page 5: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

(1) Storage Safety(1) Storage Safety

5

Java programs are garbage-collected no explicit de-allocation: GC tracks and frees garbage objects programs are oblivious to the GC scheme used: non-copying (e.g.

conservative) or copying no control over location of objects

Modern Network and I/O Devices direct DMA from/into user buffers native code is necessary to interface with hardware devices

Page 6: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

(1) Storage Safety(1) Storage Safety

6

GC heap Native heap

NI

RAM

Application Memory

DMA

NI

RAM

Application Memory

DMAON OFF OFF

copypin

pin

(a) Hard Separation: Copy-on-demand (b) Optimization: Pin-on-demand

Pin-on-demand only works for send/write operations For receive/read operations, GC must be disabled indefinitely...

Result: Hard Separation between GC and native heaps

GC heap Native heap

OFF

Page 7: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

(1) Storage Safety: Effect(1) Storage Safety: Effect

7

Throughput

0

20

40

60

80

0 8 16 24 32Kbytes

MB/s

C rawJava copyJava pin

Best case scenario: 10-40% hit in throughput pick your favorite JVM, your fastest network interface, and a pair of

450Mhz P-II with commodity OS pinning on demand is expensive...

Page 8: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

(2) Type Safety(2) Type SafetyCannot forge a reference to a Java object

b is an array of bytes in C:

double *data = (double *)b; in Java:

double[] data = new double[1024/8];

for (int i=0,off=0;i<1024/8;i++,off+=8) {

int upper = (((b[off]&0xff)<<24) +

((b[off+1]&0xff)<<16) +

((b[off+2]&0xff)<<8) +

(b[off+3]&0xff));

int lower = (((b[off+4]&0xff)<<24) + ((b[off+5]&0xff)<<16) +

((b[off+6]&0xff)<<8) +

(b[off+7]&0xff));

data[i] = Double.toLongBits(((long)upper)<<32)+

(lower&0xffffffffL))

}8

Page 9: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

(2) Type Safety(2) Type Safety

Objects have meta-data runtime safety checks (array-bounds, array-store, casts)

9

In C:struct Buffer {

int len; char data[1];}

Buffer *b = malloc(sizeof(Buffer)+1024);

b.len = 1024;

In Java:class Buffer { int len; byte[] data;

Buffer(int n) {

data = new byte[n]; len = n; }

}

Buffer b = new Buffer(1024);

1024b

lock obj

Buffer vtable

lock objbbyte[] vtable

1024

Page 10: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

(2) Type Safety(2) Type Safety

10

Result: Java objects need to be serialized and de-serialized across the network

GC heap Native heap

NI

RAM

Application Memory

DMAON OFF

copy

pin

serial

Page 11: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

(2) Type Safety: Effect (2) Type Safety: Effect

11

Performance hit of one order of magnitude: pick your favorite high-level communication abstraction (e.g.

Remote Method Invocation) pick your favorite JVM, your fastest network interface, and a pair of

450Mhz P-II

Round-Trip Latency

0

400

800

1200

1600

0 2 4 6 8Kbytes

us

C rawJava copyJava RMI copy

Page 12: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

ThesisThesis

12

Use explicit memory management to improve Java communication performance Jbufs: safe and explicit management of Java buffers

softens the GC/Native heap separation preserves type and storage safety “zero-copy” array transfers

Jstreams: extends Jbufs for optimizing serialization in clusters “zero-copy” de-serialization of arbitrary objects

GC heap Native heap

NI

RAM

Application Memory

DMAON OFF

pin

user-controlled

Page 13: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

OutlineOutlineThesis Overview

GC/Native heap separation, object serialization

Experimental Setup: Giganet cluster and Marmot

Part I: Array Transfers(1) Javia-I: Java Interface to VI Architecture

respects heap separation

(2) Jbufs: Safe and Explicit Management of Buffers Javia-II, matrix multiplication, Active Messages

Part II: Object Transfers(3) A Case For Specialization

micro-benchmarks, RMI using Javia-I/II, impact on application suite

(4) Jstreams: in-place de-serialization micro-benchmarks, RMI using Javia-III, impact on application suite

Conclusions

13

Page 14: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

Giganet ClusterGiganet Cluster

Configuration 8 P-II 450MHz, 128MB RAM 8 1.25 Gbps Giganet GNN-1000 adapter one Giganet switch

GNN1000 Adapter: User-Level Network Interface Virtual Interface Architecture implemented as a library (Win32 dll)

Base-line pt-2-pt Performance 14s r/t latency, 16s with switch over 100MBytes/s peak, 85MBytes/s with switch

14

Page 15: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

MarmotMarmotJava System from Microsoft Research

not a VM static compiler: bytecode (.class) to x86 (.asm) linker: asm files + runtime libraries -> executable (.exe) no dynamic loading of classes most Dragon book opts, some OO and Java-specific opts

Advantages source code good performance two types of non-concurrent GC (copying, conservative) native interface “close enough” to JNI

15

Page 16: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

OutlineOutlineThesis Overview

GC/Native heap separation, object serialization

Experimental Setup: Giganet cluster and Marmot

Part I: Array Transfers(1) Javia-I: Java Interface to VI Architecture

respects heap separation

(2) Jbufs: Safe and Explicit Management of Buffers Javia-II, matrix multiplication, Active Messages

Part II: Object Transfers(3) A Case For Specialization

micro-benchmarks, RMI using Javia-I/II, impact on application suite

(4) Jstreams: in-place de-serialization micro-benchmarks, RMI using Javia-III, impact on application suite

Conclusions

16

Page 17: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

Javia-IJavia-I

Basic Architecture respects heap separation

buffer mgmt in native code Marmot as an “off-the-shelf” system

copying GC disabled in native code primitive array transfers only

Send/Recv API non-blocking blocking

bypass ring accesses pin-on-demand alloc-recv: allocates new array on-

demand cannot eliminate copying during recv

17

send/recv ticket ring

send/recvqueue

descriptor

buffer

Java

C

byte array ref

Vi

GC heap

VIA

Page 18: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

Javia-I: PerformanceJavia-I: Performance

18

0

100

200

300

400

0 1 2 3 4 5 6 7 8

Kbytes

s rawcopy(s)pin(s)copy(s)+alloc(r) pin(s)+alloc(r)

0

20

40

60

80

0 8 16 24 32

Kbytes

MB/s

rawcopy(s)pin(s)copy(s)+alloc(r)pin(s)+alloc(r)

Basic Costs (PII-450, Windows2000b3):pin + unpin = (10 + 10)us, or ~5000 machine cycles

Marmot: native call = 0.28us, locks = 0.25us, array alloc = 0.75us

Latency: N = transfer size in bytes16.5us + (25ns) * N raw

38.0us + (38ns) * N pin(s)

21.5us + (42ns) * N copy(s)

18.0us + (55ns) * N copy(s)+alloc(r)

BW: 75% to 85% of raw for 16Kbytes

Page 19: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

jbufsjbufsGoal

provide buffer management capabilities to Java without violating its safety properties

re-use is important: amortizes high pinning costs

jbuf: exposes communication buffers to Java programmers1. lifetime control: explicit allocation and de-allocation

2. efficient access: direct access as primitive-typed arrays

3. location control: safe de-allocation and re-use by controlling whether or not a jbuf is part of the GC heap

heap separation becomes soft and user-controlled

19

Page 20: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

jbufs: Lifetime Control jbufs: Lifetime Control

1. jbuf allocation does not result in a Java reference to it cannot access the jbuf from the wrapper object

2. jbuf is not automatically freed if there are no Java references to it free has to be explicitly called

20

public class jbuf {

public static jbuf alloc(int bytes);/* allocates jbuf outside of GC heap */

public void free() throws CannotFreeException; /* frees jbuf if it can */

}

jbuf

GC heap

handle

Page 21: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

jbufs: Efficient Access jbufs: Efficient Access

3. (Storage Safety) jbuf remains allocated as long as there are array references to it when can we ever free it?

4. (Type Safety) jbuf cannot have two differently typed references to it at any given time when can we ever re-use it (e.g. change its reference type)?

21

public class jbuf {

/* alloc and free omitted */

public byte[] toByteArray() throws TypedException;/*hands out byte[] ref*/

public int[] toIntArray() throws TypedException; /*hands out int[] ref*/

. . .

}

jbuf

GC heap

Java byte[]

ref

Page 22: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

jbufs: Location Control jbufs: Location Control

Idea: Use GC to track references

unRef: application claims it has no references into the jbuf jbuf is added to the GC heap GC verifies the claim and notifies application through callback application can now free or re-use the jbuf

Required GC support: change scope of GC heap dynamically

22

public class jbuf {

/* alloc, free, toArrays omitted */

public void unRef(CallBack cb); /* app intends to free/re-use jbuf */

}

jbuf

GC heap

Java byte[]

ref

jbuf

GC heap

Java byte[]

ref

jbuf

GC heap

Java byte[]

ref

unRef callBack

Page 23: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

jbufs: Runtime Checksjbufs: Runtime Checks

Type safety: ref and to-be-unref states parameterized by primitive type

GC* transition depends on the type of garbage collector non-copying: transition only if all refs to array are dropped before GC copying: transition occurs after every GC

23

Unref ref<p>

to-beunref<p>

to<p>Array

to<p>Array, GC

unRef

to<p>Array, unRef

GC*

alloc

free

Page 24: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

Javia-IIJavia-II

Exploiting jbufs explicit pinning/unpinning of jbufs only non-blocking send/recvs

24

send/recv ticket ring

send/recvqueue

descriptor

jbuf

Java

C

Vi

state

GC heap

array refs

VIA

Page 25: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

Javia-II: PerformanceJavia-II: Performance

25

Basic Jbuf Costsallocation = 1.2us, to*Array = 0.8us, unRefs = 2.3 us, GC degradation=1.2us/jbuf

Latency (n = xfer size)16.5us + (0.025us) * n raw

20.5us + (0.025us) * n jbufs

38.0us + (0.038us) * n pin(s)

21.5us + (0.042us) * n copy(s)

BW within 1% of raw

0

100

200

300

400

0 1 2 3 4 5 6 7 8

Kbytes

s

raw

jbufs

copy

pin

0

10

20

30

40

50

60

70

80

0 8 16 24 32

Kbytes

MB/s

raw

jbufs

copy

pin

Page 26: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

MM: CommunicationMM: Communication

26

pMM Comm Time (64x64, 8 procs)

0

2

4

6

8

10

msecs

comm

barrier

copy-alloc

copy-async

pin-alloc

pin-async

jbufs jdk copy-alloc

jdk copy-async

67% 70%

78% 85%

56%

78% 73%

pMM Comm Time (256x256, 8 procs)

0

10

20

30

40

50

msecs

comm

barrier

copy-alloc

copy-async

pin-alloc

pin-async

jbufs jdk copy-alloc

jdk copy-async

19%16%

24% 22%

13%

29%

18%

pMM over Javia-II/jbufs spends at least 25% less in communication for 256x256 matrices on 8 processors

Page 27: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

MM: OverallMM: Overall

27

pMM MFLOPS (64x64)

0

20

40

60

80

100

120

140

160

180

200 2 procs

4 procs

8 procs

copy-alloc

copy-async

pin-alloc

pin-async

jbufsjdk copy-

allocjdk copy-

async

pMM MFLOPS (256x256)

0

50

100

150

200

250

300

350 2 procs

4 procs

8 procs

copy-alloc

copy-async

pin-alloc

pin-async

jbufsjdk copy-

allocjdk copy-

async

Cache effects: better communication performance does not always translate to better overall performance

Page 28: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

28

Exercising Jbufs: user supplies a list of jbufs upon message arrival:

jbuf passed to handler unRef is invoked after

handler invocation if pool is empty, reclaim

existing ones copying deferred to GC-time

only if needed

class First extends AMHandler {

private int first;

void handler(AMJbuf buf, …) {

int[] tmp = buf.toIntArray();

first = tmp[0];

}

}

class Enqueue extends AMHandler {

private Queue q;

void handler(AMJbuf buf, …) {

int[] tmp = buf.toIntArray();

q.enq(tmp);

}

}

Active MessagesActive Messages

Page 29: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

AM: PerformanceAM: Performance

29

Latency about 15s higher than Javia synch access to buffer pool, endpoint header, flow control

checks, handler id lookup

BW within 10% of peak for 16KByte messages

0

100

200

300

400

500

600

0 1 2 3 4 5 6 7 8

Kbytes

s

rawjbufsAM jbuf

AM copyAM copy-alloc

0

10

20

30

40

50

60

70

80

0 8 16 24 32

Kbytes

MB/s

rawjbufsAM jbufAM copyAM pinAM copy-alloc

Page 30: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

30

Jbufs: ExperienceJbufs: ExperienceEfficient access through arrays is useful:

no indirect access via method invocation promotes code re-use of large numerical kernels leverages compiler infrastructure for eliminating safety checks

Limitations still not as flexible as C buffers stale references may confuse programmers

Discussed in thesis: the necessity of explicit de-allocation implementation of Jbufs in Marmot’s copying collector impact on conservative and generational collector extension to JNI to allow “portable” implementations of Jbufs

Page 31: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

OutlineOutlineThesis Overview

GC/Native heap separation, object serialization

Experimental Setup: VI Architecture and Marmot

Part I: Array Transfers(1) Javia-I: Java Interface to VI Architecture

respects heap separation

(2) Jbufs: Safe and Explicit Management of Buffers Javia-II, matrix multiplication, Active Messages

Part II: Object Transfers(3) A Case For Specialization on Homogeneous Clusters

micro-benchmarks, RMI using Javia-I/II, impact on application suite

(4) Jstreams: in-place de-serialization micro-benchmarks, RMI using Javia-III, impact on application suite

Conclusions

31

Page 32: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

32

Standard JOS Protocol “heavy-weight” class descriptors are serialized along with objects type-checking: classes need not be “equal”, just “compatible.” protocol allows for user extensions

Remote Method Invocation object-oriented version of Remote Procedure Call relies on JOS for argument passing actual parameter object can be a sub-class of the formal parameter class.

Object Serialization and RMIObject Serialization and RMI

writeObject

GC heap

readObject

GC heap

NETWORK

Page 33: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

33

1. overheads in tens or hundreds of s: send/recv overheads=~ 3 s, memcpy of 500 bytes=~ 0.8 s

2. double[] 50% more expensive than byte[] of similar size

3. overheads grow as object sizes grow

JOS CostsJOS CostswriteObject

0

10

20

30

40

50

60

70jview

jdk

marmot

us

120 27593

byte[] 100

byte[]500

double[] 12

double[] 62

complex[]

p/elemlist 4

p/elem

readObject

0

10

20

30

40

50

60

70

jview

jdk

marmot

us

117

byte[] 100

byte[]500

double[] 12

double[] 62

complex[]

p/elemlist 4

p/elem

271

list 160p/elem

86

Page 34: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

34

Impact of Marmot’s optimizations: Method inlining: up to 66% improvement (already deployed) No synchronization whatsoever: up to 21% improvement No safety checks whatsoever: up to 15% combined

Better compilation technology unlikely to reduce overheads substantially

Impact of Marmot Impact of Marmot

Page 35: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

35

Order of magnitude worse than Javia-I/II round-trip latency drops to about 30us in a null RMI: no JOS! peak bandwidth of 22MBytes/s, about 25% of raw

Impact on RMIImpact on RMI

0

400

800

1200

1600

2000

0 1 2 3 4 5 6 7 8

Kbytes

srawjbufsRMI jbufsRMI copy+allocRMI pinRMI copy+allocjdk RMI copy

4-byte (us)150.4161.9164.5211.8271.0482.3520.1

RMIjbufs

pincopy

jdk copysockets

copy+alloc

jdk sockets

Page 36: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

36

Impact on ApplicationsImpact on Applications

% comm time (est.)

% total time (est.)

11.76% 2.73%10.90% 5.22%14.28% 13.73%

1.42% 1.37%7.64% 5.20%pMM

Application

SOR

FFT arraysFFT complexEM3D arrays

A Case for Specializing Serialization for Cluster applications: overheads a order of magnitude higher than send/recv and memcpy RMI performance degraded by one order of magnitude 5-15% “estimated” impact on applications old adage: “specialize for the common case”

Page 37: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

Optimizing De-serializationOptimizing De-serialization

“in-place” object de-serialization specialization for homogeneous cluster and JVMs

Goal eliminate copying and allocation of objects

Challenges preserve the integrity of the receiving JVM permit de-serialization of arbitrary Java objects with unrestricted usage

and without special annotations independent of a particular GC scheme

37

writeObject

GC heap GC heap

NETWORK

Page 38: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

Jstreams: writeJstreams: write

writeObject deep-copy of objects: maintains in-memory layout deals with cyclic data structures swizzle pointers: offsets to a base address replace object meta-data with 64-bit class descriptor optimization: primitive-typed arrays in jbufs are not copied

38

public class Jstream extends Jbuf {

public void writeObject(Object o) /* serializes o onto the stream */

throws TypedException, ReferencedException;

public void writeClear() /* clears the stream for writing*/

throws TypedException, ReferencedException;

}

Page 39: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

Jstreams: readJstreams: read

readObject replace class descriptors with meta-data unswizzle pointers, array-bounds checking after first readObject, add jstream to GC heap

tracks references coming out of read objects unRef: user is willing to free or re-use

39

public class Jstream extends Jbuf {

public Object readObject() throws TypedException; /* de-serialization */

public boolean isJstream(Object o); /* checks if o resides in the stream */

}

GC heapGC heap

unRef callBackGC heap

Page 40: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

jstreams: Runtime Checksjstreams: Runtime Checks

Modification to Javia-II: prevent DMA from clobbering de-serialized objects receive posts not allowed if jstream is in read mode no changes to Javia-II architecture

Unref

Write Mod

e

to-be unref

writeObject

writeObject, GC

unRef

readObjectGC*

ReadMode

readObject

readObject, GC

writeClear

unRef

alloc

free

Page 41: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

jstream: Performancejstream: Performance

41

De-serialization costs constant w.r.t. object size 2.6us for arrays, 3.3us per list element.

readObject

0

5

10

15

20

25

30

JOS jdkJOS marmotjstreams marmotjstreams (C)

us

39

byte[] 100 double[] 62 list 4 p/e

55

list 160 p/e

86

byte[] 500

Page 42: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

jstream: Impact on RMIjstream: Impact on RMI

42

4-byte round-trip latency of 45us (25us higher than Javia-II)

52MBytes/s for 16KBytes arguments

0

10

20

30

40

50

60

70

80

0 8 16 24 32

Kbytes

MB/s

rawjavia-IIAM javia-IIRMI avia-IIIRMI javia-I

Page 43: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

jstream: Impact on Applicationsjstream: Impact on Applications

43

3-10% improvement in SOR, EM3D, FFT

10% hit in pMM performance over 22,000 incoming RMIs, 1000 jstreams in receive pool, ~26

garbage collections: 15% of total execution time in GC generational collection will alleviate GC costs substantially receive pool size is hard to tune: tradeoffs between GC and locality

JOS comm (secs)

JOS total

(secs)

jstreams comm (secs)

jstreams total

(secs)

% improv. comm

% improv.

total

% improv. comm (est.)

% improv. total (est.)

4.59 19.78 3.99 19.08 13.20% 3.52% 11.76% 2.73%2.20 4.60 1.99 4.37 9.50% 4.85% 10.90% 5.22%

18.30 19.03 16.16 17.26 11.70% 9.30% 14.28% 13.73%14.82 15.36 14.29 14.83 3.57% 3.40% 1.42% 1.37%

190.58 280.00 170.91 307.80 10.32% -9.93% 7.64% 5.20%pMM

Application

SOR

FFT arraysFFT complexEM3D arrays

Page 44: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

44

Jstreams: ExperienceJstreams: Experience

Implementation of readObject and writeObject integrated into JVM protocol is JVM-specific native implementation is faster

Limitations not as flexible as Java streams: cannot read and write at the same time no “extensible” wire protocols

Discussed in thesis: implementation of Jstreams in Marmot’s copying collector support for polymorphic RMI: minor changes to the stub compiler JNI extensions to allow “portable” implementations of Jstreams

Page 45: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

Related WorkRelated WorkMicrosoft J-Direct

“pinned” arrays defined using source-level annotations JIT produces code to “redirect” array access: expensive Berkeley’s Jaguar: efficient code generation with JIT extensions security concern: JIT “hacks” may break Java or byte-code

Custom JVMs many “tricks” are possible (e.g. pinned array factories, pinned

and non-pinned heaps, etc): depend on a particular GC scheme Jbufs: isolates minimal support needed from GC

Memory Management Safe Regions (Gay and Aiken): reference counting, no GC

Fast Serialization and RMI KaRMI (Karlsruhe): fixed JOS, ground-up RMI implementation Manta (Vrije U): fast RMI but a Java dialect

45

Page 46: Safe and Efficient Cluster Communication in Java using Explicit Memory Management

SummarySummary

Use of explicit memory management to improve Java communication performance in clusters softens the GC/Native heap separation preserves type and storage safety independent of GC scheme jbufs: zero-copy array transfers jstreams: zero-copy de-serialization of arbitrary objects

Framework for building communication software and applications in Java Javia-I/II parallel matrix multiplication Jam: active messages Java RMI cluster applications: TSP, IDA, SOR, EM3D, FFT, and MM

46


Recommended