Date post: | 01-Nov-2014 |
Category: |
Technology |
Upload: | jetlore |
View: | 10,591 times |
Download: | 1 times |
Montse Medina
COO,
MongoDB Schema Design:
Insights and Tradeoffs
Saturday, May 5, 12
Social content is usefulin context
Saturday, May 5, 12
Social context is useful in context
Saturday, May 5, 12
Algorithms+
Infrastructure
Saturday, May 5, 12
Technology Stack
Apache Kafka
Saturday, May 5, 12
Outline
I. Schema design‣ Relational vs. Document-oriented
‣ Schema-less design
‣ Case study: Publishers & Subscribers
II. Lessons learned for schema design
III. Things to remember about MongoDBSaturday, May 5, 12
I. Schema design‣ Relational vs. Document-oriented
‣ Schema-less design
‣ Case study: Publishers & Subscribers
II. Lessons learned for schema design
III. Things to remember about MongoDB
Outline
Saturday, May 5, 12
vs
Users{ id: 1, name: “Robert”, from:[2], to: [5,20]}
{ id: 2, name:”Monica”, from:[23], to:[1,5]}
...
Users Graphid name
1 Robert2 Monica3 Lucas... ...
from to
1 51 202 12 5... ...
Relational vs. Document-oriented
Saturday, May 5, 12
vsUsers
{ id: 5, name: “Robert”, from:[1,2,4], to: [1,20,3,7,2]}
Graphfrom to
1 51 202 12 53 43 233 124 5... ...
Find all the “to” edges for user 5
Blocks
1 disk seek guaranteed!
Potentially as many
disk seeks as
“to” edges!
Saturday, May 5, 12
Advantages of doc-oriented schema•Avoid joins
•Disk locality when fetching relations (everything is stored within a doc record)
Considerations for schema design•N to Many relations == Lists
•Denormalization is more common
Saturday, May 5, 12
Outline
I. Schema design‣ Relational vs. Document-oriented
‣ Schema-less design
‣ Case study: Publishers & Subscribers
II. Lessons learned for schema design
III. Things to remember about MongoDBSaturday, May 5, 12
Schema-less design{id: 1, network: Twitter, name: “Robert”, from:[2], to: [5,20], screenName: “robertE”}
{id: 2, network: Facebook, name:”Maria”, from:[23], to:[1,5], likes: [“biking”, “hiking”]}...
Leverage the schemaless
nature of Mongo, but put
protection with types in
your code!
Saturday, May 5, 12
Outline
I. Schema design‣ Relational vs. Document-oriented
‣ Schema-less design
‣ Case study: Publishers & Subscribers
II. Lessons learned for schema design
III. Things to remember about MongoDBSaturday, May 5, 12
Read-Friendly
Case Study: Publishers & Subscribers
Saturday, May 5, 12
Read-Friendly Approach
Post: { _id: postId,owner: ownerId,recipient: recipientId,text: “message”, ...}
Hi!
Hi!
Hi!
Saturday, May 5, 12
Read-Friendly Approachdb.posts.find({recipient: uid})
Sharding Key:recipient
Fast retrieval, easy sharding
Slow writes, enormous amount of storage
Saturday, May 5, 12
Write-Friendly
Case Study: Publishers & Subscribers
Saturday, May 5, 12
Write-Friendly Approach
Post: { _id: postId, owner: oId, text: “message”, ...}
Hi!
Saturday, May 5, 12
Write-Friendly Approach
db.posts.find({owner: {$in:user.from}})
Sharding Key:?
Fast writes, slim storage
Slow reads, harder queries
Saturday, May 5, 12
Hybrid Approach
Case Study: Publishers & Subscribers
Saturday, May 5, 12
Hybrid Approach
Hi!
Post: { _id: postId, owner: ownerId, recipients: [u1, u2, u3, u5], text: “message”, ...}
Saturday, May 5, 12
Hybrid Approach
db.posts.find({recipients: uId})
Sharding Key:random :)
Fast writes, slim storage, reasonable read speed
Saturday, May 5, 12
Random sharding is not random!
Minimize the
number of disk
seeks per shard!Best -- Impossible for our data
Worse
Optimal solution
Saturday, May 5, 12
Outline
I. Schema design
II. Lessons learned for schema design‣ Indexes
‣ Concurrency
‣ Reducing collection size
III. Things to remember about MongoDBSaturday, May 5, 12
I. Schema design
II. Lessons learned for schema design‣ Indexes
‣ Concurrency
‣ Reducing collection size
III. Things to remember about MongoDB
Outline
Saturday, May 5, 12
link: { _id: ObjectId(...), url: “www.jetlore.com”, title: “Jetlore is a search platform for social content”, description: “...” }
link: { _id: “www.jetlore.com”, title: “Jetlore is a search platform for social content”, description: “...” }
IndexesPrimary Key
If your data has a natural
PK, use it instead of the
default ObjectId
Saturday, May 5, 12
Want all posts that a user can view sorted by the number of likes
Indexes Augment your schema to enable the
most selective index
Add a new “likesCount”
field!
db.posts.ensureIndex({recipients: 1,
likesCount: -1})
post: { _id: ObjectId(...), recipients: [...], likes: [...], likesCount: ..., ...}
Saturday, May 5, 12
db.posts.find({recipients: uId}).sort({date: -1})
Indexes Make sure to use the proper index
db.posts.ensureIndex({recipients: 1})db.posts.ensureIndex({date: 1})
vs
db.posts.ensureIndex({recipients: 1, date:1})
date: -1
Always test with
explain()
Saturday, May 5, 12
Outline
I. Schema design
II. Lessons learned for schema design‣ Indexes
‣ Concurrency
‣ Reducing collection size
III. Things to remember about MongoDBSaturday, May 5, 12
thread2: { _id: u1, name: “Bob”, from: [] }
db.users.update({_id: u1}, {$set: {_id: u1, name: ..., }}, true, false)
…but!
db.users.update({_id: thread1._id}, {$set: {thread1.from}})
db.users.update({_id: thread2._id}, {$set: {name: thread2.name}})
Concurrency Try to avoid “save()” in drivers
thread1: { _id: u1, name: “Robert”, from: [u2, u3] }
Saturday, May 5, 12
ConcurrencyAtomic Commutative Operators
db.users.update({_id: u1}, {$pull {to: u2}})
db.posts.update({_id: pId}, {$inc: {likesCount: 1}})
When updating lists and counters, instead of using $set, rely on
$inc, $addToSet, $pull
Saturday, May 5, 12
ConcurrencyNo Transactions
user1: { _id: u1, to: [u2, u3], from: [...], ...}
user2: { _id: u2, to: [...], from: [u1, ...], ...}
User1 wants to unsubscribe from user2.
Ideally we would update both users in one transaction
Implement it in your
code
Saturday, May 5, 12
Outline
I. Schema design
II. Lessons learned for schema design‣ Indexes
‣ Concurrency
‣ Reducing collection size
III. Things to remember about MongoDBSaturday, May 5, 12
Reducing collection sizeName your fields with short
names!
post: { owner: ObjectId, messageText: “loving Jetlore”, mediaUrl: “www.jetlore.com”, mediaTitle: “Jetlore is a user analytics & search platform for social content” }
post: { o: ObjectId, t: “loving Jetlore”, mu: “www.jetlore.com”, mt: “Jetlore is a user analytics & search platform for social content” }
vs
Saturday, May 5, 12
OutlineI. Schema design
II. Lessons learned for schema design
III. Things to remember about MongoDB‣ Single lock
‣ ($or + sort) query doesn’t use indexes properly
‣ Indexes with 2 list fields
‣ Record iterators + update
Saturday, May 5, 12
db.posts.find({$or: [{recipients: uId}, {privacy: Public}]}).sort({date: -1})
db.posts.ensureIndex({recipients: 1, date: -1})
db.posts.ensureIndex({privacy: 1, date: -1})
Indexes with 2 list fields
db.posts.ensureIndex({recipients: 1, links: 1}) post: { _id: ObjectId(...), recipients: [...], links: [...], ... }
$or & sort query doesn’t use the proper index
Saturday, May 5, 12
Record iterators + updating
var posts = db.posts.find().skip(n).limit(t)while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}})}
Sort by a field that will not change
db.posts.renameCollection(“oldPosts”)var posts = db.oldPosts.find().skip(n).limit(t)while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}})}
var posts = db.posts.find().sort({date: 1}).skip(n).limit(t)
Sort by a field that will not change or rename the old collection
Saturday, May 5, 12
The take aways
I. What is more important?
• Writes: Optimize for easy inserts/updates
• Reads: Optimize for easy querying
II. Denormalize to enable the most selective index
III. Concurrency: design to leverage commutative operators
Saturday, May 5, 12
Thank you!Try our tech
powered by
Saturday, May 5, 12