MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)

Montse Medina

COO,

MongoDB Schema Design:

Insights and Tradeoffs

Saturday, May 5, 12

Social content is usefulin context

Saturday, May 5, 12

Social context is useful in context

Saturday, May 5, 12

Algorithms+

Infrastructure

Saturday, May 5, 12

Technology Stack

Apache Kafka

Saturday, May 5, 12

Outline

I. Schema design‣ Relational vs. Document-oriented

‣ Schema-less design

‣ Case study: Publishers & Subscribers

II. Lessons learned for schema design

III. Things to remember about MongoDBSaturday, May 5, 12





III. Things to remember about MongoDB

Outline

Saturday, May 5, 12

vs

Users{ id: 1, name: “Robert”, from:[2], to: [5,20]}

{ id: 2, name:”Monica”, from:[23], to:[1,5]}

...

Users Graphid name

1 Robert2 Monica3 Lucas... ...

from to

1 51 202 12 5... ...

Relational vs. Document-oriented

Saturday, May 5, 12

vsUsers

{ id: 5, name: “Robert”, from:[1,2,4], to: [1,20,3,7,2]}

Graphfrom to

1 51 202 12 53 43 233 124 5... ...

Find all the “to” edges for user 5

Blocks

1 disk seek guaranteed!

Potentially as many

disk seeks as

“to” edges!

Saturday, May 5, 12

Advantages of doc-oriented schema•Avoid joins

•Disk locality when fetching relations (everything is stored within a doc record)

Considerations for schema design•N to Many relations == Lists

•Denormalization is more common

Saturday, May 5, 12

Outline






Schema-less design{id: 1, network: Twitter, name: “Robert”, from:[2], to: [5,20], screenName: “robertE”}

{id: 2, network: Facebook, name:”Maria”, from:[23], to:[1,5], likes: [“biking”, “hiking”]}...

Leverage the schemaless

nature of Mongo, but put

protection with types in

your code!

Saturday, May 5, 12

Outline






Read-Friendly

Case Study: Publishers & Subscribers

Saturday, May 5, 12

Read-Friendly Approach

Post: { _id: postId,owner: ownerId,recipient: recipientId,text: “message”, ...}

Hi!

Hi!

Hi!

Saturday, May 5, 12

Read-Friendly Approachdb.posts.find({recipient: uid})

Sharding Key:recipient

Fast retrieval, easy sharding

Slow writes, enormous amount of storage

Saturday, May 5, 12

Write-Friendly


Saturday, May 5, 12

Write-Friendly Approach

Post: { _id: postId, owner: oId, text: “message”, ...}

Hi!

Saturday, May 5, 12

Write-Friendly Approach

db.posts.find({owner: {$in:user.from}})

Sharding Key:?

Fast writes, slim storage

Slow reads, harder queries

Saturday, May 5, 12

Hybrid Approach


Saturday, May 5, 12

Hybrid Approach

Hi!

Post: { _id: postId, owner: ownerId, recipients: [u1, u2, u3, u5], text: “message”, ...}

Saturday, May 5, 12

Hybrid Approach

db.posts.find({recipients: uId})

Sharding Key:random :)

Fast writes, slim storage, reasonable read speed

Saturday, May 5, 12

Random sharding is not random!

Minimize the

number of disk

seeks per shard!Best -- Impossible for our data

Worse

Optimal solution

Saturday, May 5, 12

Outline

I. Schema design

II. Lessons learned for schema design‣ Indexes

‣ Concurrency

‣ Reducing collection size


I. Schema design


‣ Concurrency


III. Things to remember about MongoDB

Outline

Saturday, May 5, 12

link: { _id: ObjectId(...), url: “www.jetlore.com”, title: “Jetlore is a search platform for social content”, description: “...” }

link: { _id: “www.jetlore.com”, title: “Jetlore is a search platform for social content”, description: “...” }

IndexesPrimary Key

If your data has a natural

PK, use it instead of the

default ObjectId

Saturday, May 5, 12

http://www.jetlore.com


Want all posts that a user can view sorted by the number of likes

Indexes Augment your schema to enable the

most selective index

Add a new “likesCount”

field!

db.posts.ensureIndex({recipients: 1,

likesCount: -1})

post: { _id: ObjectId(...), recipients: [...], likes: [...], likesCount: ..., ...}

Saturday, May 5, 12

db.posts.find({recipients: uId}).sort({date: -1})

Indexes Make sure to use the proper index

db.posts.ensureIndex({recipients: 1})db.posts.ensureIndex({date: 1})

vs

db.posts.ensureIndex({recipients: 1, date:1})

date: -1

Always test with

explain()

Saturday, May 5, 12

Outline

I. Schema design


‣ Concurrency



thread2: { _id: u1, name: “Bob”, from: [] }

db.users.update({_id: u1}, {$set: {_id: u1, name: ..., }}, true, false)

…but!

db.users.update({_id: thread1._id}, {$set: {thread1.from}})

db.users.update({_id: thread2._id}, {$set: {name: thread2.name}})

Concurrency Try to avoid “save()” in drivers

thread1: { _id: u1, name: “Robert”, from: [u2, u3] }

Saturday, May 5, 12

ConcurrencyAtomic Commutative Operators

db.users.update({_id: u1}, {$pull {to: u2}})

db.posts.update({_id: pId}, {$inc: {likesCount: 1}})

When updating lists and counters, instead of using $set, rely on

$inc, $addToSet, $pull

Saturday, May 5, 12

ConcurrencyNo Transactions

user1: { _id: u1, to: [u2, u3], from: [...], ...}

user2: { _id: u2, to: [...], from: [u1, ...], ...}

User1 wants to unsubscribe from user2.

Ideally we would update both users in one transaction

Implement it in your

code

Saturday, May 5, 12

Outline

I. Schema design


‣ Concurrency



Reducing collection sizeName your fields with short

names!

post: { owner: ObjectId, messageText: “loving Jetlore”, mediaUrl: “www.jetlore.com”, mediaTitle: “Jetlore is a user analytics & search platform for social content” }

post: { o: ObjectId, t: “loving Jetlore”, mu: “www.jetlore.com”, mt: “Jetlore is a user analytics & search platform for social content” }

vs

Saturday, May 5, 12





OutlineI. Schema design


III. Things to remember about MongoDB‣ Single lock

‣ ($or + sort) query doesn’t use indexes properly

‣ Indexes with 2 list fields

‣ Record iterators + update

Saturday, May 5, 12

db.posts.find({$or: [{recipients: uId}, {privacy: Public}]}).sort({date: -1})

db.posts.ensureIndex({recipients: 1, date: -1})

db.posts.ensureIndex({privacy: 1, date: -1})

Indexes with 2 list fields

db.posts.ensureIndex({recipients: 1, links: 1}) post: { _id: ObjectId(...), recipients: [...], links: [...], ... }

$or & sort query doesn’t use the proper index

Saturday, May 5, 12

Record iterators + updating

var posts = db.posts.find().skip(n).limit(t)while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}})}

Sort by a field that will not change

db.posts.renameCollection(“oldPosts”)var posts = db.oldPosts.find().skip(n).limit(t)while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}})}

var posts = db.posts.find().sort({date: 1}).skip(n).limit(t)

Sort by a field that will not change or rename the old collection

Saturday, May 5, 12

The take aways

I. What is more important?

• Writes: Optimize for easy inserts/updates

• Reads: Optimize for easy querying

II. Denormalize to enable the most selective index

III. Concurrency: design to leverage commutative operators

Saturday, May 5, 12

Thank you!Try our tech

powered by

Saturday, May 5, 12

Date post:	01-Nov-2014
Category:	Technology
Upload:	jetlore
View:	10,591 times
Download:	1 times