+ All Categories
Home > Documents > Respecting the block interface computational storage using ... · storage interfaces – E.g.,...

Respecting the block interface computational storage using ... · storage interfaces – E.g.,...

Date post: 21-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
25
Respecting the block interface – computational storage using virtual objects Ian F. Adams, John Keys, Michael P. Mesnier Intel Labs
Transcript
Page 1: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Respecting the block interface –computational storage using virtual objects

Ian F. Adams, John Keys, Michael P. Mesnier

Intel Labs

Page 2: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

A brief history of computational storage

Intel Labs 2

Active Disks SeagateKinetic

HadoopAssociative Memory (1950s)

Database Machines

Simple concept with a long history– Move the compute to the data

– Associative memory, database machines, active disks, key-value HDD…

Why didn’t it gain widespread adoption?– Short version: wasn’t quite worth it… until now

1970 1980 1990 2000 2010

Page 3: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

What’s changed?

Very high density, high-performance storage is here– 16-32 TB drives are here, 100+TB SSDs are coming

• 1PB in a 1U server

– All this behind NICs, I/O controllers, devices, etc.

•Large scale disaggregated block storage is here (NVMeoF)

– Enables “diskless” storage stacks– Greater flexibility, but yet more I/O traffic

•Devices and targets are more powerful– More flexibility and headroom to work with

• (also, we’re Intel and like hardware )

Intel Labs 3

I/O

Fast server

Block management SW(maps data objects to blocks)

Storage application(DB, FS, object store, KV, ...)

READ blockWRITE block

Page 4: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Moving compute into storage

(to avoid an I/O bottleneck)

Intel Labs 4Intel confidential

Page 5: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Moving compute into storage

•Step 1. Teach the storage about data objects

– Files, objects, DB records, key-value pairs, …

•Step 2. Provide a way to program storage (API)

•Step 3. Implement compute methods in storage

– E.g., search, compress, checksum, resize, …

Intel Labs 5

Object or file-based storage makes this process straightforward

BUT, storage is fundamentally *still* built on blocks!

Page 6: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Challenge 1: Moving compute into storage

Intel Labs 6

block^

Intel confidential

Page 7: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Object Awareness

Recall Step 1: Teach storage about objects– Constraint: we need to talk block storage

Prior experience makes us leery of changing low-level storage interfaces

– E.g., uphill battle for KV drives

Can we make block storage object aware without…– Changing the interface– Adding a lot of state and complexity

We need to consider– Host and target data consistency, input vs output, non-

sector aligned data, transport considerations (bidirectional transfers), chained operations, permissions…

Intel Labs 7

Server

foo.txt“Hello, world!”

Sector 13“Hello,”

Sector 42“world!”

File system(foo.,txt 13 + 42)

Page 8: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Introducing virtual objects (step 1 of 3)

•Virtual object: – An ephemeral mapping of blocks to make block storage object aware

• Don’t have to turn block storage into object storage• Stateless: mapping is only valid for duration of an operation• Can be used for both input and output

– Complementary to existing stacks built on block storage• Object, KV store, file, etc.

Intel Labs 8

VIRTUAL_OBJ:EXT 1: LBA 2008 LEN 4096EXT 2: LBA 4104 LEN 123TOTAL_LEN: 4219

FIEMAP + Stat“/home/user/foo.txt”

This is step 1: teach the block storage about objects

Page 9: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Programmability (step 2 of 3)

•Virtual objects are embedded in compute descriptors– Add arguments and operations for computing inside block storage – Can have multiple input and output virtual objects

•Descriptors are block-protocol compatible!– For SCSI and NVME, works as a vendor specific EXEC command – Small results can be returned as a payload, larger results written to output objects

Intel Labs 9

VIRTUAL_OBJ:EXT 1: LBA 2008 LEN 4096EXT 2: LBA 4104 LEN 123TOTAL_LEN: 4219

OPCODE: “search” ARG: “baz”

Compute Descriptor

This is step 2: provides a way to program storage

Page 10: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Implementing offloads (step 3of 3)

•Object Aware Storage (OAS) Library handles host/app interactions – Cache consistency– Creating and allocating virtual objects– Building and transporting compute descriptors

•Offload Engine: interprets EXEC command an descriptors– Implement our methods like checksum, search, etc.

Intel Labs 10

OAS Library Offload Engine

Storage Transport(SCSI or NVMe)

Host Target

This is step 3: provides a way to implement operations

Page 11: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Prototype Architecture + Flow

•Built using iSCSI and NVMeoF initiators and targets

Intel Labs 11

Host

Application

File System

Initiator

File I/O Offloadrequest

Page Cache

EXEC

fiemapstat

fsync

OAS Library

Offload Engine

Block Storage Stack

Target

Transport(NVMeoF, iSCSI)

Virtual object creation, request issuing, cache consistency

Unmodified initiator stack

EXEC command & operation handling

Page 12: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Intel Labs 12

Evaluation

Page 13: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Intel Labs 13

Experimental setup

•2 servers connected via 40 GbE– Target and Host: Dual Xeon Gold 6140s, Dual Xeon E5-2699 v3s

• Runs NVMeoF stack, handles offloads

– 8 P4600 NVMe SSDs (~3 GB/s per drive)– Benchmark:

• OASBench (in-house benchmarking utility)• 100 16 MB files per SSD, 48 worker threads

•Focused on checksum offload– “Bitrot” detection for object storage– Modern hashes are I/O bound

Intel Labs 13

Host (OASBench)

Target40 GbE NVMeoF

Page 14: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Experiment 1: Conventional Access

•Read file/object data from target to host, and compute checksum– Expect to be bottlenecked by the 40 GbE link

Intel Labs 14

Host (OASBench) Target40 GbE NVMeoF

Checksum

Read

Data

Page 15: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Conventional operations

•Conventional operations: data is pulled to the host before computation– Quickly bottlenecked by 40 GbE network– <2 SSDs worth of throughput

Intel Labs 15

0

5000

10000

15000

20000

25000

30000

1 2 3 4 5 6 7 8

Ch

ecks

um

Th

rou

ghp

ut

MB

/Sec

Num. SSDs Read In Parallel

Conv. Hhash Conv. CRC Conv. ISAL-CRC 40 GbE Max Drive Throughput

Page 16: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Experiment 2: Offloaded Access

•Issue EXEC command with virtual objects– Target computes checksum in-situ and returns digest– Network bottlenecks should go away

Intel Labs 16

Host (OASBench) Target40 GbE NVMeoF

Checksum

EXEC

Checksum Digest

Page 17: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Offloaded operations

•Offloaded operations are run in the storage target– Bypasses the 40 GbE bottleneck and scales with the number of SSDs being hit– 40 GbE link bypass even what could be provided from 100 GbE!

• No longer transport bound!

– >99% reduction in network traffic, along with up to 3x speedups (Not shown)• Implemented in Ceph, Swift and MinIO

Intel Labs 17

0

5000

10000

15000

20000

25000

30000

1 2 3 4 5 6 7 8

Ch

ecks

um

Th

rou

ghp

ut

MB

/Sec

Num. SSDs Read In Parallel

OffLd. Hhash OffLd. CRC OffLd. ISAL-CRC 40 GbE Max Drive Throughput 100 GbE

Page 18: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Intel Labs 18

Challenge 2: Handling Distributed, Striped Data

Page 19: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Computational Storage and EC

Trends in Data Striping

– Erasure coded (EC) deployments have exploded beyond traditional RAID• RAID chunks in low bytes to KiB ranges

– Very difficult to offload computations

• EC chunks in hundreds of KiB to low MiB

– Individual elements easily found

– Large volumes of data have well defined structure and elements• E.g., CSVs, JSONs, dense matrices, etc.

Intel Labs 19

H E R E I S D A T A

H E R E I S D A T A

Page 20: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Our Solution

•Our solution is to leverage data structure and large stripe pieces– Most work still done inside target– Ambiguous “border” elements returned as “residuals” handled host-side

Intel Labs 20

The quick brown fox jumped over the lazy dog

Match: 0-2 No Match Partial: 32 “t” Partial: 33-34 “he”

“the”==“the”Match!

Results:Match: 0-2

Match: 32-34

Page 21: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Ongoing and Future Work

•Lots of other offloads (not enough time to cover)

– Image preprocessing for ML pipelines • >90% data movement reduction

– Merge, Sort, Search, LSM Compaction, CSV queries, microclassifiers…

•We’re not just for fabrics targets

– Methodology is compatible with devices as well

•Industry involvement and engagement

Intel Labs 21

Page 22: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Wrapping it Up!

• Introduced virtual objects for computational block storage– Prototypes in iSCSI and NVMeoF with a variety of offloads

• Showed that handling distributed, striped data can be straightforward with large EC shards and (semi) structured data

• We want collaborators!– Working on open sourcing

• Stay tuned for more updates from Intel

Intel Labs 22

Page 23: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

•Thanks for your attention!

•Questions? Comments?•[email protected]

[email protected]

[email protected]

Intel Labs 23

Page 24: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

•Extras/Backups

Intel Labs 24

Page 25: Respecting the block interface computational storage using ... · storage interfaces – E.g., uphill battle for KV drives Can we make block storage object aware without… – Changing

Applications are easy to adapt and enable

Application integration isn’t difficult

– Example with our Golangbindings using iSCSI

Client library is small – (< 500 LOC)

New offloads are straightforward

– Currently a combination of C libraries and kernel modules

– Currently porting to full userspace implementations

Intel Labs 25

/*path to talk to the scsi device*/

sgpath := "/dev/bsg/20:0:0:0"

/*Target file for operating on*/

fpath := “/mnt/oas_dev/test.txt"

/*Create the OAS Context*/

ctx := oas_client.OasCtx{sgpath}

/*Call MD5 method*/

oas_md5_resp := ctx.MD5(fpath)


Recommended