MongoDB Basic Concepts
Senior Solutions Architect, 10gen
Norberto Leite
2
Agenda
• Overview
• Replication
• Scalability
• Consistency & Durability
• Flexibility / Developer Experience
But first ...
HappyHanukkah!!!
Who’s this guy?
7
Norberto Leite
BarcelonaBarcelona
Senior Solutions Senior Solutions ArchitectArchitect
@nleite / [email protected]
8
Norberto Leite
BarcelonaBarcelona
Love MongoDBLove MongoDB
Senior Solutions Senior Solutions ArchitectArchitect
@nleite / [email protected]
9
Norberto Leite
BarcelonaBarcelona
Love MongoDBLove MongoDB
and others ...and others ...
Senior Solutions Senior Solutions ArchitectArchitect
@nleite / [email protected]
Your Data
13
Fundamentals
mongomongoDBDB
High Performance
ApplicationApplication
mongomongoDBDBmongomongoDBDB mongomongoDBDB
Horizontal Scalability
FullyConsistent
DocumentOriented{{ name: ‘Norberto Leite’,name: ‘Norberto Leite’, position: ‘SA’,position: ‘SA’, nick: ‘WingMan’,nick: ‘WingMan’, based: [‘Barcelona’, ‘London’]based: [‘Barcelona’, ‘London’]}}
Replication
15
Why do we need Replication?
• Failover
• Backups
• Secondary Batch Jobs
• High Availability
16
Outages
• Planned – Hardware upgrade– OS or file-system tuning– Software upgrade– Relocation of data to new file-system / storage
• Un-planed– Human Error– Hardware Failure– Data Center / Region Outage– Application Corruption
17
Replica Sets
• Data Protection– Multiple copies of data– Data spread across data centers, AZ’s etc
• High Availability– Automated Failover– Automated Recovery
AppPrimary
Secondary
Secondary
Asynchronous Replication
Read(default)
Write
Read(optional)
Read(optional)
AppPrimary
Secondary
Secondary
Failover
Read(default)
Write
Read(optional)
Read(optional)
AppPrimary
Secondary
Automatic Failover
Read(default)
Write
Read(optional)
Primary
Primary Election
AppRecovery
Secondary
Automatic Recovery
Read(default)
Write
Read(optional)
Primary
SecondaryRead(optional)
Sharding
23
Sharding
• Data Location Transparent to Code
• Data Distribution is Automatic– as well as re-distribution
• Aggregation System resources Horizontally
• No CODE Changes!!!
shard01 shard02 shard03
sh.shardCollection("test.tweets", {_id: 1} , false)Range Distribution
a-i j-m n-z
shard01 shard02 shard03
Chunk Split
a-i j-m n-zk-mja-jz
ki-mka-kj
shard01 shard02 shard03
Auto Balancing
a-i j-m n-zja-jz
ki-mka-kjka-kjki-m
shard01 shard02 shard03
Routed Queries
a-i j-m n-zja-jz
ki-m
ka-kj
db.tweets.find( {_id: ‘norberto’})
shard01 shard02 shard03
Scatter Gather
a-i j-m n-zja-jz
ki-m
ka-kj
db.tweets.find( {email: ‘norberto@10gen’})
shard01
a-i
j-r
n-z
300 G
B D
ata
300 GB
96 GB Mem3:1 Data/Mem
Caching
shard01
a-i
300 G
B D
ata
100 GB
96 GB Mem1:1 Data/Mem
Horizontal Distribution
shard02
a-ij-r
100 GB
96 GB Mem1:1 Data/Mem
shard03
n-z
100 GB
96 GB Mem1:1 Data/Mem
Consistency and Durability
32
Consistency
• Eventual Consistency– Allow updates when a system as been
partitioned– Resolve conflicts later– Ex: Cassandra, CouchDB
• Immediate Consistency– Single Master– Avoids conflicts– Example: MongoDB
33
Durability
• For how long is my data available?
• When do I know my data is safe?!
• Where is it safe?
• MongoDB style:– Fire and Forget– Get Last Error– Journal Sync– Replica Safe
34
Durability
Memory Journal Secondary NodesMultiple Data
Centers
RDMS
j=true
Async
w=1(default)
w=majority
w=”tag”
Flexibility
36
Data Model
• Why Json?
– Well understood data format
– Maps simply to objects
– Linking & Embedding to describe relationships
JSON
place1 = { name : "10gen HQ",address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "tech" ]}
Relational Way
MongoDB Wayembedding
linking
40
JSON & Scale Out
• Embedding removes the need for:
– Distributed Joins
– Two Phase Commit
• Enables data to be distributed across many nodes without penalty