+ All Categories
Home > Technology > An Elastic Metadata Store for eBay’s Media Platform

An Elastic Metadata Store for eBay’s Media Platform

Date post: 07-Jul-2015
Category:
Upload: mongodb
View: 722 times
Download: 0 times
Share this document with a friend
Description:
In order to build a robust, multi-tenant, highly available storage services that meet the business’ SLA your databases has to be sharded. But if your service has to scale continuously through the incremental additions of storage without service interruption or human intervention, basic static sharding is not enough. At eBay, we are building MStore to solve this problem, with MongoDB as the storage engine. In this presentation, we will dive into the key design concepts of this solution.
Popular Tags:
20
Elastic metadata store for eBay media platform Yuri Finkelstein, architect, eBay Global Platform Services MongoDB SF 2014
Transcript
Page 1: An Elastic Metadata Store for eBay’s Media Platform

Elastic metadata store for eBay media platform

Yuri Finkelstein, architect, eBay Global Platform Services

MongoDB SF 2014

Page 2: An Elastic Metadata Store for eBay’s Media Platform

Introduction• eBay items have media:

• pictures, 360 views, overlays, video, etc

• binary content + metadata

• metadata is rich and is best modeled as a document

• 99% reads, 1% writes, ~1/100th deleted daily

• MongoDB is a reasonable fit

• But we need a service on top of it

Page 3: An Elastic Metadata Store for eBay’s Media Platform

What is a data service?• Data service vs database instances

• SLA

• data lifecycle management automation instead of DBA excellence

• no downtime during hardware repair, maintenance

• no downtime as data grows and hardware is added

• multiple tenants

• tenants come and go, grow (and shrink) at different rates

• different tenant requirements for cross DC replication and data access latencies

Page 4: An Elastic Metadata Store for eBay’s Media Platform

What is wrong with this picture?

• Vertical scalability only

• Prone to coarse-grain outages

• No service model, no SLA

• Limited number of connections

• etc

App1App1App1

App1App1App2

DB

DB

DB

driv

erdr

iver

Replicas

tenant B

tenant A

Page 5: An Elastic Metadata Store for eBay’s Media Platform

Is MongoDB sharding the answer?

• Can scale out in theory, but at the time of expansion we either need a downtime or likely have SLA breach

• Mongo chunks are logical

• flurry of IO or too slow to engage new hardware

• Still no service boundary

• Other problems mentioned earlier

App1App1App1

App1App1App2

driv

er

DB DB DB

DB DB DB

DB DB DB

DB DB DB

Replicas

Shar

ds

router

router

driv

ernew shard ?

MongoS

Page 6: An Elastic Metadata Store for eBay’s Media Platform

On the effect of chunk migration: great slide by Kenny Gorman/Object Rocket

Page 7: An Elastic Metadata Store for eBay’s Media Platform

Buckets• Need smaller data granularity -

bucket

• ~100 buckets per tenant to begin with

• bucket algebra

• create / delete

• split / merge

• compact

• move (to another RS)

host (storage server)

MongoD (RS1)

MongoD (RS2)

MongoD (RS3)

MongoD (RS4)

b1 b2 b3 b4

b25 b26 b27 b28

b49 b50 b51 b52

b73 b74 b75 b76

Page 8: An Elastic Metadata Store for eBay’s Media Platform

_id=>BucketName ? • Can be done in a number of different ways,

based on the use case

• if range queries on _id are needed, use “order preserving partitioner”, ex. like in HBase

• If access is by _id only, consistent hashing works well, ex: Memcached, Cassandra

Page 9: An Elastic Metadata Store for eBay’s Media Platform

MStore DB Proxy• proxy/manage DB connections

• runs on each host

• lightweight and efficient

• connects to mongo over unix socket

• BSON in, BSON out

• performs “logical address translation” in BSON messages

• bucketName => mongoDB dbName

• dbName changes after each compaction

host (storage server)

Proxy

MongoD (RS1)

MongoD (RS2)

MongoD (RS3)

MongoD (RS4)

b1 b2 b3 b4

b25 b26 b27 b28

b49 b50 b51 b52

b73 b74 b75 b76

Page 10: An Elastic Metadata Store for eBay’s Media Platform

MStore Service Tier• stateless REST service

• domain API

• route calculation:

• _id=>BucketName

• BucketName=>ReplicaSetId

• isWrite or !staleReadOk?

• primary MongoD or some secondary MongoD

• MongoD=>host

• request goes to proxy@host

App1App1App1

App1App1App2

mstoreservice

http

/json

http/bsonstorage servers

Page 11: An Elastic Metadata Store for eBay’s Media Platform

Connections, Protocols, Payload formats

• Too many connections problem with MongoDB

• The service forms BSON and sends it to proxy over HTTP

• Proxy needs only few connections with mongo

• Fair request scheduling

mstore service

BSON/HTTP 1.1 with Keep Alive

Proxy

MongoD

Native transport/ Unix socket

Page 12: An Elastic Metadata Store for eBay’s Media Platform

MStore Coordination Tier• Manages cluster map

• Serves queries and pushes changes to map cache in the service nodes and in proxies

• Functionally similar to MongoDB Config server or ZooKeeper

• Backed by a transactional, highly available repository

MStore Coordination

crd crd crd

GET map on init

Push updateon change

Legend

cluster map

cache

coordinator service

instance

MStore service

instance

proxy

mm

db dbCoordinator

DB

Page 13: An Elastic Metadata Store for eBay’s Media Platform

The big picture

App1App1App1

App1App1App2

mstoreservice

storage servers

MStore Coordination

crd crd crd

db dbWorkflow

Management &

Automation tools

Page 14: An Elastic Metadata Store for eBay’s Media Platform

Bucket Compaction• Document deletes are expensive

• We prefer marking documents with tombstones, hence need to compact

• Compaction is done on an AUX storage node to not disturb ongoing operation

• When new bucket image is ready, in proxy:

• hold new writes

• flush pending writes

• “flip the switch” : BucketName->DB

• resume writes

• This is not easy and is implemented as a multi-step workflow

Page 15: An Elastic Metadata Store for eBay’s Media Platform

Other workflows• Compaction workflow is quite hard to master

• But the good news is - other workflows are very similar:

• bucket move is ~the same except the target RS is different

• bucket split creates 2 new buckets

• bucket merge is like 2 moves

• etc

Page 16: An Elastic Metadata Store for eBay’s Media Platform

What are we achieving?• Elastic expansion of the storage tier

• Full control over what/when/how fast to rebalance

• Efficiency of rebalancing

• Smooth and predictable operation

• Intelligent DB connection management

• SLA measurement

Page 17: An Elastic Metadata Store for eBay’s Media Platform

Final words

• Open source?

• Looking for feedback

• Contact us if interested

• Thank you!

Page 18: An Elastic Metadata Store for eBay’s Media Platform

Appendix

Page 19: An Elastic Metadata Store for eBay’s Media Platform

Buckets• Need smaller data granularity - bucket

• sizeof(bucket) << sizeof(data set)

• bucket is a single MongoDB DB

• one Replica Set has many buckets

• Bucket operations:

• create / delete

• split / merge

• compact

• move (to another RS)

• Multiple MongoD processes from different replica sets on the same physical host for best storage utilization (on big bare metal)

• These could be LXC containers

host (storage server)

MongoD (RS1)

MongoD (RS2)

MongoD (RS3)

MongoD (RS4)

b1 b2 b3 b4

b25 b26 b27 b28

b49 b50 b51 b52

b73 b74 b75 b76

Page 20: An Elastic Metadata Store for eBay’s Media Platform

Bucket Compaction• Document deletes are expensive

• We prefer marking documents with tombstones

• Compaction is the process of generating a new image of the bucket after purging deleted and expired documents

• Compaction is done on an AUX storage node to not disturb ongoing operation

• When new bucket image is ready - just “flip the switch” in the proxy

• This is not easy and is implemented as a multi-step workflow

Compaction Workflow1. mark oplog time; take a snapshot 2. copy snapshot to Aux node 3. start 2 stand-alone mongod

• source and destination 4. bulk-scan source

• skip deleted or expired docs • insert docs into destination db

5. transfer compacted bucket image to all nodes in the original replica set

6. replay oplog from old db to new db “Flip the switch” phase:7. pause writes to old db in proxy 8. keep oplog replay until all queues are

drained 9. tell proxy to enable writes to new DB 10. update Coordinator map


Recommended