+ All Categories
Home > Entertainment & Humor > MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards,...

MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards,...

Date post: 25-May-2015
Category:
Upload: mongodb
View: 2,077 times
Download: 1 times
Share this document with a friend
Description:
In this session, we'll examine schema design insights and trade-offs using real world examples. We'll look at three example applications: building an email inbox, selecting a shard key for a large scale web application, and using MongoDB to store user profiles. From these examples you should leave the session with an idea of the advantages and disadvantages of various approaches to modeling your data in MongoDB. Attendees should be well versed in basic schema design and familiar with concepts in the morning's basic schema design talk. No beginner topics will be covered in this session.
Popular Tags:
52
Technical Director, 10gen @jonnyeight [email protected] alvinonmongodb.com Alvin Richards #MongoDBdays Schema Design 3 Real World Use Cases
Transcript
Page 1: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Technical Director, 10gen

@jonnyeight [email protected] alvinonmongodb.com

Alvin Richards

#MongoDBdays

Schema Design3 Real World Use Cases

Page 2: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

I'm planning a Trip to LA…

Page 3: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Single Table En

Agenda

• Why is schema design important

• 3 Real World Schemas– Inbox– Indexed Attributes– Multiple Identities

• Conclusions

Page 4: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Why is Schema Design important?

• Largest factor for a performant system

• Schema design with MongoDB is different

• RBMS – "What answers do I have?"• MongoDB – "What question will I have?"

• Must consider use case with schema

Page 5: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

#1 - Message Inbox

Page 6: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Let’s getSocial

Page 7: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Sending Messages

?

Page 8: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Reading my Inbox

?

Page 9: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Design Goals

• Efficiently send new messages to recipients

• Efficiently read inbox

Page 10: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

3 Approaches (there are more)• Fan out on Read

• Fan out on Write

• Fan out on Write with Bucketing

Page 11: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Fan out on read – Send Message

Shard 1 Shard 2 Shard 3

Send Message

db.inbox.save( { to: [ "Bob", "Jane" ], … } )

Page 12: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Fan out on read – Inbox Read

Shard 1 Shard 2 Shard 3

Read Inbox

db.inbox.find( { to: "Bob" } )

Page 13: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

// Shard on "from"db.shardCollection( "mongodbdays.inbox", { from: 1 } )

// Make sure we have an index to handle inbox readsdb.inbox.ensureIndex( { to: 1, sent: 1 } )

msg = { from: "Joe", to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

}

// Send a messagedb.inbox.save( msg )

// Read my inboxdb.inbox.find( { to: "Bob" } ).sort( { sent: -1 } )

Fan out on read

Page 14: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Considerations

1 document per message sent Multiple recipients in an array key Reading inbox finds all messages with my

own name in the recipient field

✖Requires scatter-gather on sharded cluster

✖Then a lot of random IO on a shard to find everything

Page 15: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Fan out on write – Send Message

Shard 1 Shard 2 Shard 3

Send Message

db.inbox.save( { to: "Bob", …} )

Page 16: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Fan out on write– Read Inbox

Shard 1 Shard 2 Shard 3

Read Inbox

db.inbox.find( { to: "Bob" } )

Page 17: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

// Shard on “recipient” and “sent” db.shardCollection( "mongodbdays.inbox", { ”recipient”: 1, ”sent”: 1 } )

msg = { from: "Joe”, recipient: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

}

// Send a messagefor ( recipient in msg.recipient ) {

msg.to = recipientdb.inbox.save( msg );

}

// Read my inboxdb.inbox.find( { to: "Joe" } ).sort( { sent: -1 } )

Fan out on write

Page 18: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Considerations

✖1 document per recipient per messageReading inbox is finding all of the

messages with me as the recipientCan shard on recipient, so inbox reads hit

one shard

✖But still lots of random IO on the shard

Page 19: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Fan out on write with buckets• Each “inbox” document is an array of

messages

• Append a message onto “inbox” of recipient

• Bucket inbox documents so there’s not too many per document

• Can shard on recipient, so inbox reads hit one shard

• A few documents to read the whole inbox

Page 20: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Bucketed fan out on write - Send

Shard 1 Shard 2 Shard 3

Send Message

db.inbox.update( { to: "Bob"}, { $push: { msg: … } })

Page 21: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Bucketed fan out on write - Read

Shard 1 Shard 2 Shard 3

Read Inbox

db.inbox.find( { to: "Bob" } )

Page 22: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

// Shard on “owner / sequence”db.shardCollection( "mongodbdays.inbox", { owner: 1, sequence: 1 } )db.shardCollection( "mongodbdays.users", { user_name: 1 } )msg = { from: "Joe", to: [ "Bob", "Jane" ],

sent: new Date(), message: "Hi!",

}// Send a messagefor( recipient in msg.to) { count = db.users.findAndModify({ query: { user_name: msg.to[recipient] }, update: { "$inc": { "msg_count": 1 } }, upsert: true, new: true }).msg_count; sequence = Math.floor(count / 50);

db.inbox.update( { to: msg.to[recipient], sequence: sequence },

{ $push: { "messages": msg } },

{ upsert: true } );}// Read my inboxdb.inbox.find( { to: "Joe" } ).sort ( { sequence: -1 } ).limit( 2 )

Fan out on write – with buckets

Page 23: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Considerations

Fewer documents per recipientReading inbox is just finding a few bucketsCan shard on recipient, so inbox reads hit

one shard

✖But still some random IO on the shard

Page 24: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

But…

• What if I do not / cannot retain all history?

– Space limited: Hours, Days, Weeks, $$$– Legislative limited: HIPPA, SOX, DPA

Page 25: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

3 Approaches (there are more)• Bucket by Number of messages – just

seen that

• Fixed size Array

• Bucket by Date + TTL Collections

Page 26: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

// Query with a date rangedb.inbox.find ( { owner: "Joe", messages: { $elemMatch: { sent: { $gte: ISODate("2013-04-04…") }}}})

// Remove elements based on a datedb.inbox.update( { owner: "Joe" }, { $pull: { messages: { sent: { $gte: ISODate("2013-04-04…") } } } } )

Inbox – Bucket by # messages

Page 27: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Considerations

Limited to a known range of messages

✖Shrinking documents• space can be reclaimed withdb.runCommand ( { compact: '<collection>' } )

✖Removing the document after the last element in the array as been removed– { "_id" : …, "messages" : [ ], "owner" : "friend1", "sequence" : 0 }

Page 28: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

msg = { from: "Your Boss", to: [ "Bob" ], sent: new Date(), message: "CALL ME NOW!" }

// 2.4 Introduces $each, $sort and $slice for $pushdb.messages.update(

{ _id: 1 }, { $push: { messages: { $each: [ msg ],

$sort: { sent: 1 },

$slice: -50 }

} })

Maintain the latest – Fixed Size Array

Push this object onto the array

Sort the resulting array

by "sent"

Limit the array to 50 elements

Page 29: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Considerations

Limited to a known # of messages

✖Need to compute the size of the array based on retention period

Page 30: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

// messages: one doc per user per day

db.inbox.findOne(){

_id: 1, to: "Joe", sequence: ISODate("2013-02-04T00:00:00.392Z"), messages: [ ] }

// Auto expires data after 31536000 seconds = 1 yeardb.messages.ensureIndex( { sequence: 1 }, { expireAfterSeconds: 31536000 } )

TTL Collections

Page 31: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Considerations

Limited to a known range of messages Automatic purge of expired data

No need to have a CRON task, etc. to do this

✖ Per Collection basis

Page 32: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

#3 – Indexed Attributes

Page 33: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Design Goal

• Application needs to stored a variable number of attributes e.g.– User defined Form– Meta Data tags

• Queries needed– Equality– Range based

• Need to be efficient, regardless of the number of attributes

Page 34: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

2 Approaches (there are more)• Attributes

• Attributes as Objects in an Array

Page 35: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

// Flexible set of attributes

db.files.insert( { _id:"mongod", attr: { type: "binary", size: 256, created: ISODate("2013-04-01T18:13:42.689Z") } } )

// Need to create an index for each item in the sub-documentdb.files.ensureIndex( { "attr.type": 1 } )db.files.find( { "attr.type": "text"} )

// Can perform range queriesdb.files.ensureIndex( { "attr.size": 1 } )db.files.find( { "attr.size": { $gt: 64, $lte: 16384 } } )

Attributes

Page 36: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Considerations

Attributes can be queried via an IndexEquality & Range queries supported

✖Each attribute needs an Index

✖Each time you extend, you add an index

✖Single index is used (unless you have $or)

Page 37: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

// Flexible set of attributes, each attribute is an object

db.files.insert( { _id: "mongod", attr: [ { type: "binary" }, { size: 256 }, { created: ISODate("2013-04-01T18:13:42.689Z") } ] } )

db.files.ensureIndex( { attr: 1 } )

Attributes as Objects in Array

Page 38: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

// Range queriesdb.files.find( { attr: { $gt: { size:64 }, $lte: { size: 16384 } } } )

db.files.find( { attr: { $gte: { created: ISODate("2013-02-01T00:00:01.689Z") } } } )

// Multiple condition – Only the first predicate on the query can use the Index// ensure that this is the most selective. // Index Intersection will allow multiple indexes, see SERVER-3071

db.files.find( { $and: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ] } )

// Each $or can use an indexdb.files.find( { $or: [ { attr: { $gte: { created: ISODate("2013-02-01T…") } } }, { attr: { $gt: { size:128 }, $lte: { size: 16384 } } } ] } )

Queries

Page 39: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Considerations

Attributes can be queried via a Single index

New attributes do not need extra Indexes Equality & Range queries supported

✖ $and can only use a Single Index

Page 40: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

#3 – Multiple Identities

Page 41: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Design Goal

• Ability to look up by a number of different identities e.g.

• Username• Email address• FB Handle• LinkedIn URL

Page 42: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

2 Approaches (there are more)• Multiple Identifiers in a single document

• Separate Identifiers from Content

Page 43: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

db.users.findOne(){ _id: "joe", email: "[email protected], fb: "joe.smith", // facebook li: "joe.e.smith", // linkedin other: {…}}

// Shard collection by _iddb.shardCollection("mongodbdays.users", { _id: 1 } )

// Create indexes on each keydb.users.ensureIndex( { email: 1} )db.users.ensureIndex( { fb: 1 } )db.users.ensureIndex( { li: 1 } )

Single Document by User

Page 44: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Read by _id (shard key)

Shard 1 Shard 2 Shard 3

find( { _id: "joe"} )

Page 45: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Read by email (non-shard key)

Shard 1 Shard 2 Shard 3

find ( { email: [email protected] } )

Page 46: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Considerations

Lookup by shard key is routed to 1 shard

✖ Lookup by other identifier is scatter gathered across all shards

✖ Secondary keys cannot have a unique index

Page 47: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

// Create a document that holds all the other user attributesdb.users.save( { _id: "1200-42", ... } )

// Shard collection by _iddb.shardCollection( "mongodbdays.users", { _id: 1 } )

// Create a document for each users documentdb.identities.save( { identifier : { hndl: "joe" }, user: "1200-42" } )db.identities.save( { identifier : { email: "[email protected]" }, user: "1200-42" } )db.identities.save( { identifier : { li: "joe.e.smith" }, user: "1200-42" } )

// Shard collection by _iddb.shardCollection( "mongodbdays.identities", { identifier : 1 } )

// Create unique indexdb.identities.ensureIndex( { identifier : 1} , { unique: true} )db.users.ensureIndex( { _id: 1} , { unique: true} )

Document per Identity

Page 48: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Read requires 2 queries

Shard 1 Shard 2 Shard 3

db.identities.find({"identifier" : { "hndl" : "joe" }})

db.users.find( { _id: "1200-42"} )

Page 49: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Considerations

Multiple queries, but always routed Lookup to Identities is a routed query Lookup to Users is a routed query

Unique indexes available

Page 50: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Conclusion

Page 51: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Summary

• Multiple ways to model a domain problem

• Understand the key uses cases of your app

• Balance between ease of query vs. ease of write

• Avoid Random IO

• Avoid Scatter / Gather query pattern

Page 52: MongoDB San Francisco 2013: Data Modeling Examples From the Real World presented by Alvin Richards, 10Gen Technical Director for EMEA, 10gen

Technical Director, 10gen

@jonnyeight [email protected] alvinonmongodb.com

Alvin Richards

#MongoDBdays

Thank You


Recommended