+ All Categories
Home > Documents > Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g.,...

Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g.,...

Date post: 11-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
29
Boosting the Power of Swift with Metadata Search Presenters Dean Hildebrand Eran Rom Nilesh Bhosale Joint work with Paula Ta-Shma Guy Hadash 1
Transcript
Page 1: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

1

Boosting the Power of Swift with Metadata SearchPresentersDean HildebrandEran RomNilesh Bhosale

Joint work withPaula Ta-ShmaGuy Hadash

1

Page 2: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Agenda

▪ What is Object Metadata?

▪ What is Metadata Search?

▪ Use Cases

▪ Demo

▪ Implementation Details

▪ Future Work

2 2

Page 3: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

What is Metadata?

▪ User-defined metadata▪ Unique feature of object storage compared to other storage systems

▪ Swift and S3 metadata are compatible through Swift3 middleware

▪ Metadata is the structured data about the unstructured object▪ Who, what, when, where, and why of account, container, object

▪ Perfect for indexing and searching

3 3

Page 4: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Metadata Examples

4

Age Biomarkers Developmental Stage Cell Surface Markers Cell Type/Cell LineDisease State Extract Molecule Genetic Characteristics Immunoprecipitation AntibodyOrganism Platform Sex Strain Time Point Tissue Type Treatment Compound

Biomedical

Astronomy & Astrophysics

Geospatial

Image

Music

4

Page 5: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

What Swift Metadata Exists and How do I use it?

▪ User Metadata can be added/removed to Accounts/Containers/Objects

▪ E.g., X-Container-Meta-{name}, X-Remove-Container-Meta-{name}

▪ System metadata also exists, some can even be set by the user▪ E.g., Content-Type, Last-Modified

▪ Semantics▪ PUT and POST Metadata Semantics

Account/Container – New user metadata added to existing list of metadataObject – New user metadata overwrites all existing user metadata

▪ COPY retains existing metadata unless new metadata is specified▪ HEAD returns metadata only

5

Page 6: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

What is Metadata Search?

6

▪ Automatically index and catalog Swift user and system metadata

▪ Provide REST-API for searching for objects based on their metadata

▪ Currently available in IBM SoftLayer Swift object storage service

6

Page 7: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Why is Metadata Search Valuable?

7

▪ Imagine Internet without Google

▪ Swiftly find needles in the OpenStack

▪ Help users and administrators perform Data Analytics

▪ Metadata can be on highest tier (SSD) while data resides on lower tier (Disk/Tape)

General Use Cases

▪ Data Mining

▪ Data Warehousing

▪ Selective data retrieval, data backup,

data archival, data migration

▪ Management/Reporting 7

Page 8: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

8

City: RomeTime: Day

photo1.jpgCity: RomeTime: Night

photo2.jpgCity: HaifaTime: Day

photo3.jpg

GET /MyPhotoSpace?query=city=‘Rome’ AND Time=’Day’

GET /MyPhotoSpace?query=time=‘Night’

* Schematic, not complete syntax

Sample Use-CasesAdvanced Photo Album

8

photo4.jpgCity: TokyoTime: Night

Page 9: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Media use case - Complex Searches

Search Query

GET /MyPhotoSpace?query=tags ~ 'John' OR tags ~ 'Bob' OR tags ~ 'Alice' AND date > 2/12/2012 AND date < 3/12/2013 AND num_views > 10000

What we searched for?

▪ Date range search

▪ Free Text matching

▪ Integer comparison

9

Page 10: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Metadata Enrichment

Storlet

Object Store

Swift

Upload

EnrichedMetadata

Data

myvideo.mxf

Metadata

Data

myvideo.mxf

Data

Metadata Search with Enriched Metadata – Developed with RAI Italy

10

Page 11: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Finding objects by their metadata values

SwiftGet objects whose loudness

is faulty

Object Store

Metadata Search Facility

myvideo.mxf

Find faulty objects

11

Page 12: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Analyze IoT data efficiently and cost effectively

– Treat Swift as a long term store for semi-structured IoT data

– Store in Parquet format– Queryable via Apache Spark SQL– Optimized predicate pushdown

- Implemented a custom Spark SQL external data source driver

- Uses metadata indexes- Searches for Swift objects whose min/max

values overlap requested ranges

Get all data for morning traffic:SELECT codigo, intensidad, velocidad FROM madridtraffic WHERE tf >= '08:00:00' AND tf <= '12:00:00'

Brute force method13245 Swift requests

Optimized predicate pushdown616 Swift requests

21.5 times improvement

Swift

Analytics Use Case

Page 13: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

IoT Analytics Use Case Example Metadata

Page 14: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

IoT Use Case - EMT Madrid Bus Service

▪ Search capability allows understanding traffic at a

given time slot, helps plan better for future events

▪ Historical Data about bus trips - generated by IoT

devices mounted on the EMT Buses

▪ Data ingested into Object Store, along with relevant

metadata

14

Page 15: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Data Collected from EMT Buses

15

Page 16: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Kafka + Secor

Groups into objects, uploads at regular intervals

Storletsgeneratemetadata

1. Storlet converts GPS coordinates from UTM to lat,long

2. Storlet calculates GPS bounding box and stores as metadata

Bus Data continuously uploaded to Object Store

16

Page 17: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

17

Demo

17

Page 18: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Behind the Scenes of Metadata Search

18

▪Metadata search involves two flows:

▪ Indexing objects’ metadata

▪ Serving search queries

18

Page 19: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Indexing Objects’ Metadata

19 19

Storage System input data path

Page 20: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Indexing Objects’ Metadata

20 20

Storage System input data path Indexer

Page 21: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Indexing Objects’ Metadata

21 21

Storage System input data path

Queue

Index / SearchIndex /

Search

Indexer

Page 22: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Indexing Objects’ Metadata

22 22

Swift Proxy pipeline Swift Storage Tier

Rabbit

Elastic SearchElastic

Search

Indexer Middleware

Page 23: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Serving Search Requests

23 23

Swift Proxy pipeline

Elastic SearchElastic

Search

MD SearchMiddleware

Page 24: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Swift Object Store

ProxyService

StorageNodes

Indexer

Swift ProxyNodes

StorageNodes

Swift StorageNodes

HTTP SwiftRequests

Load Balancer

Overall Architecture

24

Search

...Rabbit

ProxyService

Indexer

Search

Rabbit

ElasticSearch Cluster

Page 25: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Example:

GET http://iotserver.example.com/v1/AUTH_...2357c/busData?

query=X-Object-Meta-Top-Left-G in [40.7,22.5],[39.9,22.1] AND

X-Object-Meta-Bottom-Right-G in [40.7,22.5],[39.9,22.1]

X-Context: search

Query API

Page 26: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Example:GET http://iotserver.example.com/v1/AUTH_...2357c/busData?

query=X-Object-Meta-Top-Left-G in [40.7,22.5],[39.9,22.1] AND

X-Object-Meta-Bottom-Right-G in [40.7,22.5],[39.9,22.1]

▪Query Features:1. Multiple criteria possible2. Supports various operators

• =, !=, <, <=,in,~,...3. Supports metadata data types

• strings, integers, floats, dates, geo-points, free text• Allows comparisons and range searches

Query API

Page 27: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Where Do We Go From Here?

▪Extend to support File-based (NFS/SMB) attributes▪Standardize Search API▪Standardize back-end APIs to allow support for any queuing and/or database systems▪Work on visualizing information through Kibana, etc▪Collaborate with OpenStack Community Efforts▪ Swift Event Notification Mechanism▪ OpenStack Searchlight

■ Also built on Elastic Search and RabbitMQ■ Work to standardize search API

27

Page 28: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

Spectrum Scale Object Store

ProxyService

ObjectService

SpectrumScale

ObjectService

SpectrumScale

..

.Keystone

AuthenticationService

SwiftServices

AdditionalServices in

Cluster

Metadata Index DB

Search and SwiftRequestsLoad

Balancer

Will be Available with IBM Spectrum Scale - 4Q15

ES

ProxyService

Middleware

RMQ

28

Middleware

ES RMQ

1.Pre-installed and configured Virtual Appliance

2.Roll-your-own solution○ White Paper to be

released describing how to setup and configure

○ Will include a source tarball

○ Fine tune as per your requirements

Page 29: Guy Hadash with Metadata Search Boosting the Power of Swift … · 2019. 2. 26. · E.g., Content-Type, Last-Modified ... bounding box and stores as metadata Bus Data continuously

29 29


Recommended