Date post: | 08-Sep-2014 |
Category: |
Technology |
Upload: | mongodb |
View: | 491 times |
Download: | 0 times |
Building a Social Platform
Part 1: Design Overview;Storing Infinite Content
Solutions Engineering
• Identify Popular Use Cases– Directly from MongoDB Users– Addressing "limitations"
• Go beyond documentation and blogs• Create open source project• Run it!
Social Status Feed
Agenda• What is a status feed and why build it w/MongoDB• Application overview (goals, non-goals) • Architecture overview (arch diagram)• Operational overview (benchmarks, automation)• Describe components – Describe options
• For each component– Options tried– Results– Option chosen
Socialite
• News/Social Status Feed: popular and common
• Appears misleadingly simple: turns out to have many tricky problems to solve to have good performance
• We created a reference implementation – Configurable models and options– Built-in benchmarking
• Used this implementation to test out different options.• This talk will summarize
Status Feed
Status Feed
Socialite
• Open Source• Reference Implementation – Various Fanout Feed Models– User Graph Implementation– Content storage
• Configurable models and options• REST API in Dropwizard (Yammer)– https://dropwizard.github.io/dropwizard/
• Built-in benchmarking
https://github.com/10gen-labs/socialite
Architecture
Graph Service
Proxy
Cont
ent
Prox
y
Pluggable Services
• Major components each have an interface– see com.mongodb.socialite.services
• Configuration selects implementation to use• ServiceManager organizes : – Default implementations– Lifecycle– Binding configuration– Wiring dependencies– see com.mongodb.socialite.ServiceManager
Simple Interface
GET /users/{user_id} Get a User by their ID DELETE /users/{user_id} Remove a user by their ID POST /users/{user_id}/posts Send a message from this user GET /users/{user_id}/followers Get a list of followers of a user GET /users/{user_id}/followers_count Get the number of followers of a user GET /users/{user_id}/following Get the list of users this user is following GET /users/{user_id}/following count Get the number of users this user follows GET /users/{user_id}/posts Get the messages sent by a user GET /users/{user_id}/timeline Get the timeline for this user PUT /users/{user_id} Create a new user PUT /users/{user_id}/following/{target} Follow a user DELETE /users/{user_id}/following/{target} Unfollow a user
https://github.com/10gen-labs/socialite
Technical Decisions
User
timeline cache
Schema
Indexing Horizontal Scaling
Operational Setup
Real life validation of our choices.
User facing latencyLinear scaling of resources
Most important criteria?
Operational Testing
Scaling Goals
• Realistic real-life-scale workload– compared to Twitter, etc.
• Understanding of HW required– containing costs
• Confirm architecture scales linearly– without loss of responsiveness
Architecture
Graph Service
Proxy
Cont
ent
Prox
y
DB Architecture
The storage layer is separate from Socialite services, and each service has its own URI – its own mongodb server or cluster that can be configured differently from others.
This allows us to physically optimize each services' DB for the workload we'll be running on it.
It also allows us to scale out the DB that's currently the limiting factor (the bottleneck) in our setup.
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Operational Testing
Built-in benchmark capability
Operational Testing
• All hosts in AWS• Each service used its own DB, cluster or shards• All benchmarks through `mongos` (sharded config)• Used MMS monitoring for measuring throughput• Used internal benchmarks for measuring latency• Based volume tested on real life social metrics
Scaling for Infinite Content
Architecture
Graph Service
Proxy
Cont
ent
Prox
y
Socialite Content Service
• System of record for all user content• Initially very simple (no search)• Mainly designed to support feed– Lookup/indexed by _id and userid– Time based anchors/pagination
• Half life of most content is 1 day !
• Popular content usually < 1 month
• Access to old data is rare
Social Data Ages Fast