+ All Categories
Home > Technology > Introduction to sharding

Introduction to sharding

Date post: 01-Nov-2014
Category:
Upload: mongodb
View: 425 times
Download: 1 times
Share this document with a friend
Description:
 
Popular Tags:
52
Introduction to Sharding Software Engineer, MongoDB Craig Wilson #MongoDBDays @craiggwilson
Transcript
Page 1: Introduction to sharding

Introduction to Sharding

Software Engineer, MongoDB

Craig Wilson

#MongoDBDays

@craiggwilson

Page 2: Introduction to sharding

Sharding is a Solution for scalability

Page 3: Introduction to sharding

Examining Growth

•  User Growth –  1995: 0.4% of the world’s population –  Today: 30% of the world is online (~2.2B) –  Emerging Markets & Mobile

•  Data Set Growth –  Facebook’s data set is around 100 petabytes –  4 billion photos taken in the last year (4x a decade ago)

Page 4: Introduction to sharding

Do you need to Shard?

Page 5: Introduction to sharding

Read/Write Throughput Exceeds I/O

Page 6: Introduction to sharding

Working Set Exceeds Physical Memory

Page 7: Introduction to sharding

Sharding in MongoDB

Page 8: Introduction to sharding

Horizontally Scalable

Page 9: Introduction to sharding

Application Independent

Page 10: Introduction to sharding

One API

Page 11: Introduction to sharding

What is a Shard?

Page 12: Introduction to sharding

Replica Set

Primary Secondary

Secondary

Page 13: Introduction to sharding

Single Node in a Cluster

P S

S

P S

S

P S

S

Shard Shard Shard

Page 14: Introduction to sharding

Composed of Chunks

•  Grouping of data based on a range

•  Default Max Size: 64 MB

Page 15: Introduction to sharding

Chunks Have Ranges

A-B

M

S-Z

Page 16: Introduction to sharding

Chunks Get Split

A-B

M

S-V

W-Z

Page 17: Introduction to sharding

Chunks Get Migrated

•  One shard has 7 more chunks than another

•  Triggered manually

Page 18: Introduction to sharding

Chunks Get Migrated

•  One shard has 7 more chunks than another

•  Triggered manually

Page 19: Introduction to sharding

Chunks Get Migrated

•  One shard has 7 more chunks than another

•  Triggered manually

Page 20: Introduction to sharding

How does it all work?

Page 21: Introduction to sharding

Configuration

•  3 Config Servers –  Just mongod –  Stores chunk ranges and location –  Not a replica set

Config Config Config

Page 22: Introduction to sharding

Routers

•  Mongos –  Both a router and a balancer –  No local data –  Can have 1 or many

Mongos

Page 23: Introduction to sharding

Cluster

P S

S

P S

S

P S

S

Shard Shard Shard

Mongos Mongos

Config

Config

Config

Application Application

Page 24: Introduction to sharding

Query Routing

Page 25: Introduction to sharding

Shard Key

•  Defines the range of data called a Key Space

•  Defines the distribution of documents in a collection

•  Every document must contain the Shard Key

•  Shard Keys are immutable

Page 26: Introduction to sharding

Chunks

•  Each chunk contains a non-overlapping range of Shard Key values

Page 27: Introduction to sharding

3 Types of Queries

•  Targeted Queries

•  Scatter Gather Queries

•  Scatter Gather Queries with Sorting

Page 28: Introduction to sharding

Targeted Queries

•  Query contains the shard key

P S

S

P S

S

P S

S

Mongos

Page 29: Introduction to sharding

Scatter Gather Queries

•  Query does not contain the shard key

P S

S

P S

S

P S

S

Mongos

Page 30: Introduction to sharding

Scatter Gather Queries with Sort

•  Query does not contain the shard key

•  Sorting is done first on the Shard

•  Results are merged in Mongos

P S

S

P S

S

P S

S

Mongos

Page 31: Introduction to sharding

How do I pick a good Shard Key?

Page 32: Introduction to sharding

Considerations

•  Cardinality

•  Write Distribution

•  Query Isolation

•  Reliability

•  Index Locality

Page 33: Introduction to sharding

>  db.emails.find({  user:  123  })  

{  

     _id:  ObjectId(),    

     user:  123,  

     time:  Date(),    

     subject:  “...”,    

     recipients:  [],    

     body:  “...”,    

     attachments:  []  

}  

 

Example: Email Storage

Page 34: Introduction to sharding

Cardinality Write Scaling

Query Isolation

Reliability Index

Locality

Example: Email Storage

Page 35: Introduction to sharding

Cardinality Write Scaling

Query Isolation

Reliability Index

Locality

_id Doc level One shard Scatter/gather

All users affected

Good

Example: Email Storage

Page 36: Introduction to sharding

Cardinality Write Scaling

Query Isolation

Reliability Index

Locality

_id Doc level One shard Scatter/gather

All users affected

Good

hash(_id) Hash level All Shards Scatter/gather

All users affected

Poor

Example: Email Storage

Page 37: Introduction to sharding

Cardinality Write Scaling

Query Isolation

Reliability Index

Locality

_id Doc level One shard Scatter/gather

All users affected

Good

hash(_id) Hash level All Shards Scatter/gather

All users affected

Poor

user Many docs All Shards Targeted Some users affected Good

Example: Email Storage

Page 38: Introduction to sharding

Cardinality Write Scaling

Query Isolation

Reliability Index

Locality

_id Doc level One shard Scatter/gather

All users affected

Good

hash(_id) Hash level All Shards Scatter/gather

All users affected

Poor

user Many docs All Shards Targeted Some users affected Good

user, time Doc level All Shards Targeted Some users affected Good

Example: Email Storage

Page 39: Introduction to sharding

How do I get up and running?

Page 40: Introduction to sharding

5 Steps

•  Launch Config Servers

•  Launch Mongos

•  Launch Shards

•  Add Shards

•  Enable Sharding

Page 41: Introduction to sharding

Launch Config Servers

•  mongod  –configsvr  

•  Starts 1 config server on the default port 27019

Config

Config

Config

Page 42: Introduction to sharding

Launch Mongos

•  mongos  –configdb  hostname:27019,hostname2:27019,hostname3:27019  

Mongos Config

Config

Config

Page 43: Introduction to sharding

Launch Shards

•  Nothing special, just like a normal replica set

P S

S

Shard

Mongos Config

Config

Config

Page 44: Introduction to sharding

Add Shards

•  Connect to mongos via the shell

•  sh.addShard(“<rsname>/<seedlist>”)  

P S

S

Shard

Mongos Config

Config

Config

Page 45: Introduction to sharding

db.runCommand({  listShards:  1  })  {    

   shards  :  [  

       {  _id:  “shard0000”,  host:  “<hostname>:27017”  }    

   ],  

   “ok”  :  1  }  

 

Verify that the shard was added

Page 46: Introduction to sharding

Enable Sharding

•  Enable sharding on a database –  sh.enableSharding(“<dbname>”)  

•  Shard a collection with the given key –  sh.shardCollection(“<dbname>.people”,  {  country:  1  })  –  sh.shardCollection(“<dbname>”.cars”,  {  year:  1,  uniqueid:  1})  

Page 47: Introduction to sharding

Tag Aware Sharding

•  Tag aware sharding allows you to control the distribution of your data

•  Tag a range of shard keys –  sh.addTagRange(<collection>,<min>,<max>,<tag>)  

•  Tag a shard –  sh.addShardTag(<shard>,<tag>)  

Page 48: Introduction to sharding

Conclusion

Page 49: Introduction to sharding

Read/Write Throughput Exceeds I/O

Page 50: Introduction to sharding

Working Set Exceeds Physical Memory

Page 51: Introduction to sharding

Sharding Enables Scale

MongoDB’s Auto-Sharding

–  Easy to Configure –  Consistent Interface –  Free and Open Source

Page 52: Introduction to sharding

Thank You

Software Engineer, MongoDB

Craig Wilson

#MongoDBDays

@craiggwilson


Recommended