Eliot Horowitz@eliothorowitz
MongoSVDecember 3, 2010
Sharding Internals
MongoDB Sharding
• Scale horizontally for data size, index size, write and consistent read scaling
• Distribute databases, collections or a objects in a collection
• Auto-balancing, migrations, management happen with no down time
• Choose how you partition data
• Can convert from single master to sharded system with no downtime
• Same features as non-sharding single master
• Fully consistent
Range Based
• collection is broken into chunks by range
• chunks default to 64mb or 100,000 objects
MIN MAX LOCATION
A F shard1
F M shard1
M R shard2
R Z shard3
Architecture
client
mongos ...mongos
mongodmongod
mongod mongod
mongod
mongod...
Shards
mongod
mongod
mongod
ConfigServers
Shards
• Can be master, master/slave or replica sets
• Replica sets gives sharding + full auto-failover
• Regular mongod processes
Config Servers
• 3 of them
• changes are made with 2 phase commit
• if any are down, meta data goes read only
• system is online as long as 1/3 is up
mongos
• Sharding Router
• Acts just like a mongod to clients
• Can have 1 or as many as you want
• Can run on appserver so no extra network traffic
• Cache meta data from config servers
Writes
• Inserts : require shard key, routed
• Removes: routed and/or scattered
• Updates: routed or scattered
Queries
• By shard key: routed
• sorted by shard key: routed in order
• by non shard key: scatter gather
• sorted by non shard key: distributed merge sort
Splitting
• Take a chunk and split it in 2
• Splits on the median value
• Splits only change meta data, no data change
SplittingMIN MAX LOCATION
A Z shard1
T1
MIN MAX LOCATION
A G shard1
G Z shard1
T2
MIN MAX LOCATION
A D shard1
D G shard1
G S shard1
S Z shard1
T3
Balancing
• Moves chunks from one shard to another
• Done online while system is running
• Balancing runs in the background
• Any mongos can initiate
• All data transfer happens directly between shards
• Once initiated, mongos can go without issue
MigratingMIN MAX LOCATION
A D shard1
D G shard1
G S shard1
S Z shard1
T3
MIN MAX LOCATION
A D shard1
D G shard1
G S shard1
S Z shard2
T4
MIN MAX LOCATION
A D shard1
D G shard1
G S shard2
S Z shard2
T5
Live Migrations• Migrate initiated
• Initial data copied
• Delta copied until steady state + secondary ack
• Start commit
• Copy final delta, wait for secondary ack
• Finish commit
DEMO
Download MongoDBhttp://www.mongodb.org
andletusknowwhatyouthink@eliothorowitz@mongodb
10gen is hiring!http://www.10gen.com/jobs
User profiles
• Partition by user_id
• Secondary indexes on location, dates, etc...
• Reads/writes know which shard to hit
User Activity Stream
• Shard by user_id
• Loading a user’s stream hits a single shard
• Writes are distributed across all shards
• Can index on activity for deleting
Photos
• Can shard by photo_id for best read/write distribution
• Secondary index on tags, date
Logging
• date
• machine, date
• logger name
Possible Shard Keys