Date post: | 15-Jan-2015 |
Category: |
Technology |
Upload: | rick-copeland |
View: | 11,471 times |
Download: | 1 times |
Scaling with MongoDB
Rick Copeland @rick446Arborian Consulting, LLC
Who am I?
Now a consultant, but formerly…
Software engineer at SourceForge, early adopter of MongoDB (version 0.8)
Wrote the SQLAlchemy book (I love SQL when it’s used well)
Mainly write Python now, but have done C++, C#, Java, Javascript, VHDL, Verilog, …
Scaling without MongoDB
You can do it with an RDBMS as long as you… Don’t use joins Don’t use transactions Use read-only slaves Use memcached Denormalize your data Use custom sharding/partitioning Do a lot of vertical scaling ▪ (we’re going to need a bigger box)
Vertical scaling
Developer productivity
Scaling with MongoDB
Use documents to improve locality
Optimize your indexes
Be aware of your working set
Scaling your disks
Replication for fault-tolerance and read scaling
Sharding for read and write scaling
Terminology
Relational (SQL) MongoDB
Database Database
Table Collection
Index Index
Row Document
Column Field
DynamicTyping
B-tree(range-based)
Think JSON
Primitive types + arrays,
documents
Documents improve locality
{ title: "Slides for Scaling with MongoDB", author: "Rick Copeland", date: ISODate("20012-02-29T19:30:00Z"), text: "My slides are available on speakerdeck.com", comments: [ { author: "anonymous", date: ISODate("20012-02-29T19:30:01Z"), text: "Frist psot!" }, { author: "mark”, date: ISODate("20012-02-29T19:45:23Z"), text: "Nice slides" } ] }
Embed comment data in blog post document
Disk seeks and data locality
Seek = 5+ ms Read = really really fast
Disk seeks and data locality
Post
AuthorCommen
t
Disk seeks and data locality
Post
Author
Comment
Comment
Comment
Comment
Comment
Avoid full collection scans
1 2 3 4 5 6 7
Looked at 7 objects
Find where x equals 7
Indexing: use a tree lookup
7
6
5
4
3
2
1
Looked at 3 objects
Find where x equals 7
Randomly-distributed index
Entire index must fit in RAM
Right-aligned index
Only small portion in
RAM
Be aware of your working set
Working set = sizeof(frequently used data) + sizeof(frequently used indexes)
Right-aligned indexes reduce working set size
Working set should fit in available RAM for best performance
Page faults are the biggest cause of performance loss in MongoDB
Quantify your working set
> db.foo.stats(){ "ns" : "test.foo", "count" : 1338330, "size" : 46915928, "avgObjSize" : 35.05557523181876, "storageSize" : 86092032, "numExtents" : 12, "nindexes" : 2, "lastExtentSize" : 20872960, "paddingFactor" : 1, "flags" : 0, "totalIndexSize" : 99860480, "indexSizes" : { "_id_" : 55877632, "x_1" : 43982848}, "ok" : 1}
Data Size
Average doc size
Size on disk (or RAM!)
Size of all indexes
Size of each index
Scaling your disks: single disk
~200 seeks / second
Scaling your disks: RAID-0
~200 seeks / second ~200 seeks / second ~200 seeks / second
Faster, but less reliable
Scaling your disks: RAID-10
~400 seeks / second ~400 seeks / second ~400 seeks / second
Faster and more reliable ($$$ though)
Replication
Primary
Secondary
Secondary
Read / Write
Read
Read
Old and busted master/slave replication
The new hotness replica sets with automatic failover
Replication
Primary handles all writes
Application optionally sends reads to slaves
Heartbeat manages automatic failover
Replication: the oplog
Special collection (the oplog) records operations idempotently
Secondaries read from primary oplog and replay operations locally
Space is preallocated and fixed for the oplog
Replication: the oplog
{ "ts" : Timestamp(1317653790000, 2), "h" : -6022751846629753359, "op" : "i", "ns" : "confoo.People", "o" : { "_id" : ObjectId("4e89cd1e0364241932324269"), "first" : "Rick", "last" : "Copeland” }}
Insert
Collection name
Object to insert
Replication: failover
Use heartbeat signal to detect failure
When primary can’t be reached, elect a new one
Replica that’s the most up-to-date is chosen
If there is skew, changes not on new primary are saved to a .bson file for manual reconciliation
Application can require data to be replicated to a majority to ensure this doesn’t happen
Replica sets: advanced topics
Priority Slower nodes with lower priority Backup or read-only nodes to never be primary
slaveDelay Fat-finger protection
Data center awareness and tagging Application can ensure complex replication
guarantees
When replication is not enough
Reads scale nicely As long as the working set fits in RAM … and you don’t mind eventual consistency
Sharding to the rescue! Automatically partitioned data sets Scale writes and reads Automatic load balancing between the
shards
Sharding architecture
Shard 10..10
Shard 210..20
Shard 320..30
Shard 430..40
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
Primary
Secondary
Secondary
MongoS
Configuration
Config 1 Config 2 Config 3
MongoS
Sharding setup
Sharding is per-collection and range-based
The highest-impact choice (and hardest to change decision) you make is the shard key Random keys: good for writes, bad for reads Right-aligned index: bad for writes Small # of discrete keys: very bad Ideal: balance writes, make reads routable by
mongos Optimal shard key selection is hard
Example sharding setup
Primary Data Center Secondary Data Center
Shard 1Priority 1
Shard 1Priority 1
Shard 1Priority 0
Shard 2Priority 1
Shard 2Priority 1
Shard 2Priority 0
Shard 3Priority 1
Shard 3Priority 1
Shard 3Priority 0
Config 1 Config 2 Config 3
RS1
RS2
RS3
Config
Sharding benefits
Writes and reads both scale (with good choice of shard key)
Reads scale while remaining strongly consistent
Partitioning ensures you get more usable RAM
Pitfall: don’t wait too long to add capacity
Questions?
Rick Copeland @rick446Arborian Consulting, LLC