Fabian Hueske – Juggling with Bits and Bytes

transcript

Juggling with Bits and Bytes How Apache Flink operates on binary data

Fabian Hueske

:ueske@apache.org @:ueske

Big Data frameworks on JVMs

•  Many (open source) Big Data frameworks run on JVMs –  Hadoop, Drill, Spark, Hive, Pig, and ... –  Flink as well

•  Common challenge: How to organize data in-‐memory? –  In-‐memory processing (sorOng, joining, aggregaOng) –  In-‐memory caching of intermediate results

•  Memory management of a system influences –  Reliability –  Resource efficiency, performance & performance predictability –  Ease of configuraOon

The straight-‐forward approach

Store and process data as objects on the heap •  Put objects in an array and sort it

A few notable drawbacks •  PredicOng memory consumpOon is hard

–  If you fail, an OutOfMemoryError will kill you!

•  High garbage collecOon overhead –  Easily 50% of Ome spend on GC

•  Objects have considerable space overhead –  At least 8 bytes for each (nested) object! (Depends on arch)

FLINK’S APPROACH

Flink adopts DBMS technology

•  Allocates fixed number of memory segments upfront •  Data objects are serialized into memory segments •  DBMS-‐style algorithms work on binary representaOon

Why is that good?

•  Memory-‐safe execuOon –  Used and available memory segments are easy to count –  No parameter tuning for reliable operaOons!

•  Efficient out-‐of-‐core algorithms –  Memory segments can be efficiently wrifen to disk

•  Reduced GC pressure –  Memory segments are off-‐heap or never deallocated –  Data objects are short-‐lived or reused

•  Space-‐efficient data representaOon

•  Efficient operaOons on binary data 6

What does it cost?

•  Significant implementaOon investment –  Using java.uOl.HashMap vs. –  ImplemenOng a spillable hash table backed by byte arrays and custom serializaOon stack

•  Other systems use similar techniques –  Apache Drill, Apache AsterixDB (incubaOng)

•  Apache Spark evolves into a similar direcOon

MEMORY ALLOCATION

Memory segments

•  Unit of memory distribuOon in Flink –  Fixed number allocated when worker starts

•  Backed by a regular byte array (default 32KB)

•  On-‐heap or off-‐heap allocaOon

•  R/W access through Java’s efficient unsafe methods

•  MulOple memory segments can be logically concatenated to a larger chunk of memory

On-‐heap memory allocaOon

Off-‐heap memory allocaOon

On-‐heap vs. Off-‐heap

•  No significant performance difference in micro-‐benchmarks

•  Garbage CollecOon –  Smaller heap -‐> faster GC

•  Faster start-‐up Ome –  A mulO-‐GB JVM heap takes Ome to allocate

DATA SERIALIZATION

Custom de/serializaOon stack

•  Many alternaOves for Java object serializaOon –  Dynamic: Kryo –  Schema-‐dependent: Apache Avro, Apache Thrip, Protobufs

•  But Flink has its own serializaOon stack –  OperaOng on serialized data requires knowledge of layout –  Control over layout can improve efficiency of operaOons –  Data types are known before execuOon

Rich & extensible type system

•  SerializaOon framework requires knowledge of types

•  Flink analyzes return types of funcOons –  Java: ReflecOon based type analyzer –  Scala: Compiler informaOon + CodeGen via Macros

•  Rich type system –  Atomics: PrimiOves, Writables, Generic types, … –  Composites: Tuples, Pojos, CaseClasses –  Extensible by custom types

Serializing a Tuple3<Integer, Double, Person>

OPERATING ON BINARY DATA

Data processing algorithms

•  Flink’s algorithms are based on RDBMS technology –  External Merge Sort, Hybrid Hash Join, Sort Merge Join, …

•  Algorithms receive a budget of memory segments –  AutomaOc decision about budget size –  No fine-‐tuning of operator memory!

•  Operate in-‐memory as long as data fits into budget –  And gracefully spill to disk if data exceeds memory

In-‐memory sort – Fill the sort buffer

In-‐memory sort – Sort the buffer

In-‐memory sort – Read sorted buffer

SHOW ME NUMBERS!

Sort benchmark

•  Task: Sort 10 million Tuple2<Integer, String> records –  String length 12 chars

•  Tuple has 16 Bytes of raw data •  ~152 MB raw data

–  Integers uniformly, Strings long-‐tail distributed –  Sort on Integer field and on String field

•  Generated input provided as mutable object iterator

•  Use JVM with 900 MB heap size –  Minimum size to reliable run the benchmark

SorOng methods 1.  Objects-‐on-‐Heap:

–  Put cloned data objects in ArrayList and use Java’s CollecOon sort. –  ArrayList is iniOalized with right size.

2.  Flink-‐serialized (on-‐heap): –  Using Flink’s custom serializers. –  Integer with full binary sorOng key, String with 8 byte prefix key.

3.  Kryo-‐serialized (on-‐heap): –  Serialize fields with Kryo. –  No binary sorOng keys, objects are deserialized for comparison.

•  All implementaOons use a single thread •  Average execuOon Ome of 10 runs reported •  GC triggered between runs (does not go into reported Ome)

ExecuOon Ome

Garbage collecOon and heap usage

Objects-‐on-‐heap

Flink-‐serialized

Memory usage

•  Breakdown: Flink serialized -‐ Sort Integer –  4 bytes Integer –  12 bytes String –  4 bytes String length –  4 bytes pointer –  4 bytes Integer sorOng key –  28 bytes * 10M records = 267 MB

Object-‐on-‐heap Flink-‐serialized Kryo-‐serialized

Sort Integer Approx. 700 MB 277 MB 266 MB

Sort String Approx. 700 MB 315 MB 266 MB

Going out-‐of-‐core

•  Single thread HashJoin with 4GB memory budget •  Build side varies, Probe side 64GB

WHAT’S NEXT?

We’re not done yet!

•  SerializaOon layouts tailored towards operaOons – More efficient operaOons on binary data

•  Table API provides full semanOcs for execuOon –  Use code generaOon to operate fully on binary data

•  …

Summary

•  AcOve memory management avoids OOMErrors

•  Highly efficient data serializaOon stack –  Facilitates operaOons on binary data –  Makes more data fit into memory

•  DBMS-‐style operators operate on binary data –  High performance in-‐memory processing –  Graceful destaging to disk if necessary

•  Read Flink’s blog: –  hfp://flink.apache.org/news/2015/05/11/Juggling-‐with-‐Bits-‐and-‐Bytes.html –  hfp://flink.apache.org/news/2015/03/13/peeking-‐into-‐Apache-‐Flinks-‐Engine-‐Room.html –  hfp://flink.apache.org/news/2015/09/16/off-‐heap-‐memory.html

hfp://flink.apache.org @ApacheFlink

Apache Flink

Fabian Hueske – Juggling with Bits and Bytes

Technology