+ All Categories
Home > Documents > Caribou: Intelligent Distributed Storage - ETH...

Caribou: Intelligent Distributed Storage - ETH...

Date post: 29-Mar-2018
Category:
Upload: donga
View: 221 times
Download: 4 times
Share this document with a friend
14
Caribou: Intelligent Distributed Storage Zsolt István, David Sidler, Gustavo Alonso Systems Group, Department of Computer Science, ETH Zurich 1
Transcript
Page 1: Caribou: Intelligent Distributed Storage - ETH Zpeople.inf.ethz.ch/zistvan/doc/vldb17-caribou-slides.pdf · Software clients, Key-value interface (Single-key lookup or Scanning) Cuckoo

Caribou:Intelligent Distributed Storage

Zsolt István, David Sidler, Gustavo AlonsoSystems Group, Department of Computer Science, ETH Zurich 1

Page 2: Caribou: Intelligent Distributed Storage - ETH Zpeople.inf.ethz.ch/zistvan/doc/vldb17-caribou-slides.pdf · Software clients, Key-value interface (Single-key lookup or Scanning) Cuckoo

2

Rack-scale thinking ToR Switch

Storage Storage

Storage Storage

Compute

Compute

Compute

Compute

In the Cloud

In an Appliance

+ Provisioning

+ Independent Scalability

- Data movement bottleneck

Page 3: Caribou: Intelligent Distributed Storage - ETH Zpeople.inf.ethz.ch/zistvan/doc/vldb17-caribou-slides.pdf · Software clients, Key-value interface (Single-key lookup or Scanning) Cuckoo

3

Storage Design Options

Oracle Exadata

IBM PureData

Deuteronomy

Samsung YourSQL

Winsconsin SmartSSD

Kinetic Drives

BlueCache

Features similar to

software

Balanced design

+ Full-fledged

- SW+HW overhead

- Large footprint

- Outside management

+ No-overhead access

+ Small footprint

Compute > Bandwidth Compute < Bandwidth

Compute ~ Bandwidth

Page 4: Caribou: Intelligent Distributed Storage - ETH Zpeople.inf.ethz.ch/zistvan/doc/vldb17-caribou-slides.pdf · Software clients, Key-value interface (Single-key lookup or Scanning) Cuckoo

Intelligent Distributed Storage with FPGAs

Easy integration on commodity network

Random access to tuples & in-storage scans

Selection predicate pushdown

Data replicated consistently to nodes

Extensible (open-source) design

4

What is Caribou?

Caribou

Node

10Gbps Switch

Clients

Clients

Clients

Caribou

Node

Caribou

Node

Caribou

Node

Clients

Clients

fpgasystems

Page 5: Caribou: Intelligent Distributed Storage - ETH Zpeople.inf.ethz.ch/zistvan/doc/vldb17-caribou-slides.pdf · Software clients, Key-value interface (Single-key lookup or Scanning) Cuckoo

Field Programmable Gate Array

Reprogrammable hardware

Large number of configurable logic blocks

Tight integration, massive parallelism

Network/App Co-design

Innovation…

5

FPGA 101

FPGA

Page 6: Caribou: Intelligent Distributed Storage - ETH Zpeople.inf.ethz.ch/zistvan/doc/vldb17-caribou-slides.pdf · Software clients, Key-value interface (Single-key lookup or Scanning) Cuckoo

Caribou

Node

10Gbps Switch

Clients

Clients

Clients

Caribou

Node

Caribou

Node

Caribou

Node

Clients

Clients

6

Inside a Caribou node

Caribou

DRAM

ProcessingKey-value

managementReplication

Network

TCP/IP

1000s of

connections,

SW clients

Software clients, Key-value interface (Single-key lookup or Scanning)

Cuckoo hash

table, slab memory

allocation,

bitmap indexes

Conditionals,

Regex,

Decompression

Primary/backup

Atomic

Broadcast

The pipeline runs at the

same speed at the

network (line-rate)

Page 7: Caribou: Intelligent Distributed Storage - ETH Zpeople.inf.ethz.ch/zistvan/doc/vldb17-caribou-slides.pdf · Software clients, Key-value interface (Single-key lookup or Scanning) Cuckoo

7

Throughput of random access to storage

Page 8: Caribou: Intelligent Distributed Storage - ETH Zpeople.inf.ethz.ch/zistvan/doc/vldb17-caribou-slides.pdf · Software clients, Key-value interface (Single-key lookup or Scanning) Cuckoo

8

Random access response times

0

10

20

30

40

50

60

0 64 128 192 256

Re

sp

on

se

tim

e [

us

]

Value size [B]

Get Put/Update Put/Update (Replicated)

• Response times comparable to SW on Infiniband, but Caribou uses

commodity networking

Page 9: Caribou: Intelligent Distributed Storage - ETH Zpeople.inf.ethz.ch/zistvan/doc/vldb17-caribou-slides.pdf · Software clients, Key-value interface (Single-key lookup or Scanning) Cuckoo

SELECT … FROM customer

WHERE age<35 AND purchases>2

AND address LIKE “%Luzern%CH%”

Multiple comparisons to constants (conjunction)

Substrings or regular expression matching [1]

Can filter compressed data (LZ77)

Extensible pipeline design

[1] Accelerating Pattern Matching Queries in Hybrid CPU-FPGA Architectures. D. Sidler, Zs. Istvan, M. Ewaida, G. Alonso. 2017 ACM SIGMOD/PODS Conference (SIGMOD'17)

9

Operator push-down

The filtering circuits

are parameterized at

runtime, with no

overhead.

Page 10: Caribou: Intelligent Distributed Storage - ETH Zpeople.inf.ethz.ch/zistvan/doc/vldb17-caribou-slides.pdf · Software clients, Key-value interface (Single-key lookup or Scanning) Cuckoo

10

Exploiting Parallelism

Regular

Expressions

DR

AM

Transform

Comparison

Predicate

LZ77

LZ77

LZ77

LZ77

Regex

Core

Regex

Core

Regex

Core

Regex

Core

Th

roughput

Thro

ughput

Complexity

Va

lue

Va

lue

Value’

0Value’

1

Value’

1

Value’

1

Keep?

Page 11: Caribou: Intelligent Distributed Storage - ETH Zpeople.inf.ethz.ch/zistvan/doc/vldb17-caribou-slides.pdf · Software clients, Key-value interface (Single-key lookup or Scanning) Cuckoo

11

Scan and filter

Choice of filter and value size do not impact scan rate.

Bound by the

Filter

performance

Bound by the

network/client

Scan rate in GB/s is

same regardless

value size

Page 12: Caribou: Intelligent Distributed Storage - ETH Zpeople.inf.ethz.ch/zistvan/doc/vldb17-caribou-slides.pdf · Software clients, Key-value interface (Single-key lookup or Scanning) Cuckoo

Filtering can be combined with random access reads as well

12

Near Data Processing without Surprises

Page 13: Caribou: Intelligent Distributed Storage - ETH Zpeople.inf.ethz.ch/zistvan/doc/vldb17-caribou-slides.pdf · Software clients, Key-value interface (Single-key lookup or Scanning) Cuckoo

In-Storage Processing Stand-alone boards, MPSoC (ARM+FPGA)

Add NVMe flash, N.V. Memory

Explore different KVS (memcached, redis, …)

In-Network Processing Microsoft Catapult NICs

Work on streaming data

Distributed service in the cloud

Accelerator Intel Xeon+FPGA

Offload computation without partitioning or copying data

13

“The Times They Are A-Changin”

Page 14: Caribou: Intelligent Distributed Storage - ETH Zpeople.inf.ethz.ch/zistvan/doc/vldb17-caribou-slides.pdf · Software clients, Key-value interface (Single-key lookup or Scanning) Cuckoo

Data movement bottleneck on many levels

Caribou – Intelligent Distributed Storage Software-like service in a small footprint

Balanced design with “right amount” of compute

Caribou – Platform to Explore Near-data Processing Open source, modular and portable

Data processing operators applicable on other HW platforms

https://github.com/fpgasystems/caribou

14

Time to Explore…

https://www.systems.ethz.ch/fpga/ [email protected]


Recommended