+ All Categories
Home > Documents > Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set...

Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set...

Date post: 20-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
117
Skyway: Connecting Managed Heaps in Distributed Big Data Systems Khanh Nguyen, Lu Fang, Christian Navasca, Harry Xu, Brian Demsky University of Chicago University of California, Irvine Shan Lu
Transcript
Page 1: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Skyway:Connecting Managed Heaps

in Distributed Big Data Systems

Khanh Nguyen, Lu Fang, Christian Navasca,Harry Xu, Brian Demsky

University of ChicagoUniversity of California, IrvineShan Lu

Page 2: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

2

Page 3: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

2

Page 4: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

2

Page 5: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

3

Page 6: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

3

JVMMR,

Spark Apps

Page 7: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

3

JVMMR,

Spark Apps

Page 8: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

3

JVMMR,

Spark Apps

Page 9: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

3

Page 10: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

3

Page 11: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

3

Page 12: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

3

Page 13: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

4

Page 14: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

4

Page 15: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

4

Page 16: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

4

outDataset

Page 17: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

4

OutputStream out = Shuffler.GetOutputStream(receiver_id);

for (Object o: outDataset) {out.writeObject(o);

}

serialization

outDataset

Page 18: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

4

OutputStream out = Shuffler.GetOutputStream(receiver_id);

for (Object o: outDataset) {out.writeObject(o);

}

serialization

outDataset

Page 19: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

4

InputStream in = Shuffler.GetInputStream(sender_id);

while (in.hasData()) {Object o = in.readObject();inDataset.store(o)

} deserialization

Page 20: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

4

InputStream in = Shuffler.GetInputStream(sender_id);

while (in.hasData()) {Object o = in.readObject();inDataset.store(o)

} deserialization

inDataset

Page 21: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Tim

e (s

ec)

Serializers0

350

700

1050

1400

1750

Data transfer costs

5TriangleCounting over LiveJournal on Spark 2.1.0 with 3 slaves

Page 22: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Tim

e (s

ec)

Serializers0

350

700

1050

1400

1750

Data transfer costs

5TriangleCounting over LiveJournal on Spark 2.1.0 with 3 slaves

14%18%

17%

16%

Page 23: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Data transfer

6

Receiver

Object

Sender

Page 24: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Data transfer

6

Receiver

Serialization

Object

Sender

Page 25: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Data transfer

6

Receiver

Serialization

Object

Reflection.getField

Sender

Page 26: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Data transfer

6

Receiver

Serialization

Object

Reflection.getField

Sender

Page 27: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Data transfer

6

Receiver

Serialization

Object

Reflection.getField

Sender

Page 28: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Data transfer

6

Receiver

Serialization

Object

Binary

Reflection.getField

Sender

Page 29: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Network

Data transfer

6

Receiver

Serialization

Object

Binary

Reflection.getField

Sender

Page 30: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Network

Data transfer

6

Receiver

Object

Binary

Sender

Page 31: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Data transfer

6

Receiver

Deserialization

Object

Binary

Sender

Page 32: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Data transfer

6

Receiver

Deserialization

Object

Binary

Reflection.setField

Reflection.allocate

Sender

Page 33: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Data transfer

6

Receiver

Deserialization

Object

Binary

Reflection.setField

Reflection.allocate

Sender

Page 34: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Data transfer

6

Receiver

Deserialization

Object

Binary

Reflection.setField

Reflection.allocate

Sender

Page 35: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Data transfer

7

Receiver

Reflection.getField

Reflection.setField

Reflection.allocate

Sender

Page 36: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Data transfer

7

ReceiverSender

Page 37: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

8

Page 38: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

8

Page 39: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

8

Page 40: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

8

Page 41: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

8

Page 42: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

8

Page 43: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Our solution

9

Page 44: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Our solution

9

ReceiverSender

Page 45: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Our solution

9

Object

ReceiverSender

Page 46: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Our solution

9

Object

ReceiverSender

Skyway

Page 47: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Our solution

9

Object

Reflection.getField

Reflection.setField

Reflection.allocate

ReceiverSender

Skyway

Page 48: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Our solution

9

Object

Reflection.getField

Reflection.setField

Reflection.allocate

ReceiverSender

Skyway

Page 49: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Our solution

9

Object

Reflection.getField

Reflection.setField

Reflection.allocate

ReceiverSender

Skyway

Page 50: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Skyway Overview

10

Page 51: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Skyway Overview

– Implemented in OpenJDK 8• Modified the class loader, the object/heap layout,

the Parallel Scavenge GC– Efficiently handle data transfer:

• Outperforms 90 serializers• Improves Spark by 36% (Java) - 16% (Kryo)• Improves Flink by 19%

10

Page 52: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Challenges

11

Page 53: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Challenges

1. Type representation

11

Page 54: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Challenges

1. Type representation Automated global type numbering

11

Page 55: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Challenges

1. Type representation Automated global type numbering

2. Pointer representation

11

Page 56: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Challenges

1. Type representation Automated global type numbering

2. Pointer representation Use relative offsets

11

Page 57: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Challenges

1. Type representation Automated global type numbering

2. Pointer representation Use relative offsets

3. Local JVM adaptation

11

Page 58: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Challenges

1. Type representation Automated global type numbering

2. Pointer representation Use relative offsets

3. Local JVM adaptation Visible for garbage collection

11

Page 59: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Challenges

1. Type representation Automated global type numbering

2. Pointer representation Use relative offsets

3. Local JVM adaptation Visible for garbage collection

4. Work pipelining

11

Page 60: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Challenges

1. Type representation Automated global type numbering

2. Pointer representation Use relative offsets

3. Local JVM adaptation Visible for garbage collection

4. Work pipelining Buffering

11

Page 61: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Type registries

12

Metadata Object

Worker A

TypeString ID12

“java. lang.Object”“org.apache.spark.

rdd.RDD”“java. lang.String” 5

Type Registry A

klass for “java.lang.Object”

klass for “java.lang.String”

Worker B

TypeString ID15

“java. lang.Object”“java. lang.String”

120

Type Registry B

klass for “org.apache.spark.scheduler.Task”

klass for “java.lang.Object”

“org.apache.spark.scheduler.Task”

TypeString ID12

4

“java. lang.Object”“org.apache.spark.

rdd.RDD”“java.util.HashMap”

...

3“java.util.HashSet”“java. lang.String” 5

...“org.apache.spark.

scheduler.Task” 120

Type Registry

Cluster

klass for “org.apache.spark.rdd.RDD”

Metadata Object

klass for “java.lang.String”

Master

Page 62: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Output & Input buffer

13

Page 63: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Output & Input buffer

13

Output buffer

Page 64: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Output & Input buffer

13

Output buffer

Page 65: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Output & Input buffer

13

Output buffer

•Segregated by receivers•One for each receiver

Page 66: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Output & Input buffer

13

Output buffer

•Segregated by receivers•One for each receiver•In native, off-the-heap memory

Page 67: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Output & Input buffer

13

Output bufferInput buffer

•Segregated by receivers•One for each receiver•In native, off-the-heap memory

Page 68: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Output & Input buffer

13

Output bufferInput buffer

•Segregated by receivers•One for each receiver•In native, off-the-heap memory

Page 69: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Output & Input buffer

13

Output bufferInput buffer

•Segregated by receivers•One for each receiver•In native, off-the-heap memory

•Segregated by senders•Multiple for each sender

Page 70: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Output & Input buffer

13

Output bufferInput buffer

•Segregated by receivers•One for each receiver•In native, off-the-heap memory

•Segregated by senders•Multiple for each sender•In managed heap

Page 71: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Serialization

14

Integer[] 0xbb 0xcc0xaa3

Integer 20 Integer 30Integer 10

Page 72: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Serialization

14

Integer[] 0xbb 0xcc0xaa3

Integer 20 Integer 30Integer 10

writeObject()

Page 73: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Serialization

14

Integer[] 0xbb 0xcc0xaa3

Integer 20 Integer 30Integer 10

writeObject()

Page 74: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Serialization

14

Integer[] 0xbb 0xcc0xaa3

Integer 20 Integer 30Integer 10

Output bufferin native memory

writeObject()

Page 75: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Serialization

14

Integer[] 0xbb 0xcc0xaa3

Integer 20 Integer 30Integer 10

0Offset

Output bufferin native memory

writeObject()

Page 76: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Serialization

14

Integer[] 0xbb 0xcc0xaa3

Integer 20 Integer 30Integer 10

0Offset

Output bufferin native memory

writeObject()

Page 77: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Serialization

14

Integer[] 0xbb 0xcc0xaa3

Integer 20 Integer 30Integer 10

0Offset

Output bufferin native memory

“java.lang.Integer” 6“[java.lang.Integer” 7

TypeString ID

writeObject()

Page 78: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Serialization

14

Integer[] 0xbb 0xcc0xaa3

Integer 20 Integer 30Integer 10

0xbb 0xcc0xaa37

0Offset

Output bufferin native memory

“java.lang.Integer” 6“[java.lang.Integer” 7

TypeString ID

writeObject()

Page 79: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Serialization

14

Integer[] 0xbb 0xcc0xaa3

Integer 20 Integer 30Integer 10

0xbb 0xcc0xaa37

0Offset

Output bufferin native memory

“java.lang.Integer” 6“[java.lang.Integer” 7

TypeString ID

writeObject()

Page 80: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Serialization

14

Integer[] 0xbb 0xcc0xaa3

Integer 20 Integer 30Integer 10

0xbb 0xcc0xaa37

0Offset

Output bufferin native memory

“java.lang.Integer” 6“[java.lang.Integer” 7

TypeString ID

7

writeObject()

Page 81: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Serialization

14

Integer[] 0xbb 0xcc0xaa3

Integer 20 Integer 30Integer 10

0xbb 0xcc0xaa37 106

0Offset

Output bufferin native memory

“java.lang.Integer” 6“[java.lang.Integer” 7

TypeString ID

7

writeObject()

Page 82: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Serialization

14

Integer[] 0xbb 0xcc0xaa3

Integer 20 Integer 30Integer 10

0xbb 0xcc0xaa37 106

0Offset

Output bufferin native memory

7

“java.lang.Integer” 6“[java.lang.Integer” 7

TypeString ID

7

writeObject()

Page 83: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Serialization

14

Integer[] 0xbb 0xcc0xaa3

Integer 20 Integer 30Integer 10

0xbb 0xcc0xaa37 106 206

0 11Offset

Output bufferin native memory

7 11

“java.lang.Integer” 6“[java.lang.Integer” 7

TypeString ID

7

writeObject()

Page 84: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Serialization

14

Integer[] 0xbb 0xcc0xaa3

Integer 20 Integer 30Integer 10

0xbb 0xcc0xaa37 106 206 306

0 11 15Offset

Output bufferin native memory

7 11 15

“java.lang.Integer” 6“[java.lang.Integer” 7

TypeString ID

7

writeObject()

Page 85: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Deserialization

15

0 7 11 15

0xfb 0xff 20 300xf7 103

Offset

7 6 6 611 157

11 15737 10 20 306 6 6

Page 86: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Deserialization

15

0 7 11 15

0xfb 0xff 20 300xf7 103

Offset

7 6 6 611 157

11 15737 10 20 306 6 6

readObject()

Page 87: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Deserialization

15

0 7 11 15

0xfb 0xff 20 300xf7 103

Offset

7 6 6 611 157

Input buffer in heap

readObject()

Page 88: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Deserialization

15

0 7 11 15

0xfb 0xff 20 300xf7 103

Offset

7 6 6 611 157

Input buffer in heap

java.lang.Integer6java.lang.Integer[]7

MetadataObjectID

readObject()

Page 89: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Deserialization

15

0 7 11 15

0xfb 0xff 20 300xf7 10Integer[] 3

Offset

6 6 611 157

Input buffer in heap

java.lang.Integer6java.lang.Integer[]7

MetadataObjectID

readObject()

Page 90: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Deserialization

15

0 7 11 15

0xfb 0xff 20 300xf7 10Integer[] 3

Offset

6 6 611 15

Input buffer in heap

java.lang.Integer6java.lang.Integer[]7

MetadataObjectID

readObject()

Page 91: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Deserialization

15

0 7 11 15

0xfb 0xff 20 300xf7 10Integer[] 3

Offset

6 6 615

Input buffer in heap

java.lang.Integer6java.lang.Integer[]7

MetadataObjectID

readObject()

Page 92: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Deserialization

15

0 7 11 15

0xfb 0xff 20 300xf7 10Integer[] 3

Offset

6 6 6

Input buffer in heap

java.lang.Integer6java.lang.Integer[]7

MetadataObjectID

readObject()

Page 93: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Deserialization

15

0 7 11 15

0xfb 0xff 20 300xf7 10Integer[] 3 Integer

Offset

6 6

Input buffer in heap

java.lang.Integer6java.lang.Integer[]7

MetadataObjectID

readObject()

Page 94: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Deserialization

15

0 7 11 15

0xfb 0xff 20 300xf7 10Integer[] 3 IntegerInteger

Offset

6

Input buffer in heap

java.lang.Integer6java.lang.Integer[]7

MetadataObjectID

readObject()

Page 95: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Example: Deserialization

15

0 7 11 15

0xfb 0xff 20 300xf7 10Integer[] 3 IntegerIntegerInteger

Offset

Input buffer in heap

java.lang.Integer6java.lang.Integer[]7

MetadataObjectID

readObject()

Page 96: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

In the paper

• Cyclic references• Shared objects• Support for threads• Interaction with GC• Integrating Skyway in real systems

16

Page 97: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Evaluations - Microbenchmark

17

• Java Serializer Benchmark Set– Extensive performance evaluation with

existing 90 serializers

Page 98: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

18

SKYWAY

Page 99: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

18

1.8x

SKYWAY

GOOGLE’s Protobuf

Page 100: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

18

1.8x

2.2x

SKYWAY

GOOGLE’s Protobuf

Kryo(rec. by Spark)

Page 101: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Evaluations – Real Systems

19

• Flink 1.3.2– 5 query answering applications– TPC-H datasets

Page 102: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Evaluations – Real Systems

19

• Flink 1.3.2– 5 query answering applications– TPC-H datasets

Page 103: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Evaluations – Real Systems

19

• Flink 1.3.2– 5 query answering applications– TPC-H datasets

On average, reduces end-to-end time by 19%

Page 104: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

0

0.2

0.4

0.6

0.8

1

1.2

Improvement Summary: Flink

20Execution TimeSer. Time Deser. Time

Norm

alize

d Pe

rform

ance

to b

uilt-

in se

rializ

er

Page 105: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

0

0.2

0.4

0.6

0.8

1

1.2

Improvement Summary: Flink

20Execution TimeSer. Time Deser. Time

Norm

alize

d Pe

rform

ance

to b

uilt-

in se

rializ

er

Page 106: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

0

0.2

0.4

0.6

0.8

1

1.2

Improvement Summary: Flink

20Execution TimeSer. Time Deser. Time

Norm

alize

d Pe

rform

ance

to b

uilt-

in se

rializ

er

Page 107: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

0

0.2

0.4

0.6

0.8

1

1.2

Improvement Summary: Flink

20Execution TimeSer. Time Deser. Time

Norm

alize

d Pe

rform

ance

to b

uilt-

in se

rializ

er

Page 108: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Evaluations – Real Systems

21

• Spark 2.1.0– 4 applications: WordCount, PageRank,

ConnectedComponents, and TriangleCounting

– 4 datasets:LiveJournal, Orkut, UK-2005, and Twitter

Page 109: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Evaluations – Real Systems

21

• Spark 2.1.0– 4 applications: WordCount, PageRank,

ConnectedComponents, and TriangleCounting

– 4 datasets:LiveJournal, Orkut, UK-2005, and Twitter

On average, reduces end-to-end time by 16% (w.r.t. Kryo)

by 36% (w.r.t. Java serializer)

Page 110: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Kryo Skyway0

0.2

0.4

0.6

0.8

1

1.2

1.4

Kryo Skyway Kryo Skyway

Improvement Summary: Spark

22

Norm

alize

d Pe

rform

ance

to

Java

Ser

ialize

r

Execution TimeSer. Time Deser. Time

Page 111: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Kryo Skyway0

0.2

0.4

0.6

0.8

1

1.2

1.4

Kryo Skyway Kryo Skyway

Improvement Summary: Spark

22

Norm

alize

d Pe

rform

ance

to

Java

Ser

ialize

r

Execution TimeSer. Time Deser. Time

Page 112: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Kryo Skyway0

0.2

0.4

0.6

0.8

1

1.2

1.4

Kryo Skyway Kryo Skyway

Improvement Summary: Spark

22

Norm

alize

d Pe

rform

ance

to

Java

Ser

ialize

r

Execution TimeSer. Time Deser. Time

Page 113: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Kryo Skyway0

0.2

0.4

0.6

0.8

1

1.2

1.4

Kryo Skyway Kryo Skyway

Improvement Summary: Spark

22

Norm

alize

d Pe

rform

ance

to

Java

Ser

ialize

r

Execution TimeSer. Time Deser. Time

Page 114: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Kryo Skyway0

0.2

0.4

0.6

0.8

1

1.2

1.4

Kryo Skyway Kryo Skyway

Improvement Summary: Spark

22

Norm

alize

d Pe

rform

ance

to

Java

Ser

ialize

r

Execution TimeSer. Time Deser. Time

Page 115: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Kryo Skyway0

0.2

0.4

0.6

0.8

1

1.2

1.4

Kryo Skyway Kryo Skyway

Improvement Summary: Spark

22

Norm

alize

d Pe

rform

ance

to

Java

Ser

ialize

r

Execution TimeSer. Time Deser. Time

Page 116: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Conclusion• Goal: Reduce data transfer costs in

Big Data systems

• Solution: Skyway, the first JVM-based serializer– Efficiently transfer data– Easy to integrate

23

Page 117: Skyway: Connecting Managed Heaps in Distributed Big Data ... · •Java Serializer Benchmark Set – Extensive performance evaluation with existing 90 serializers. 18 SKYWAY. 18 1.8x

Thank You!

24


Recommended