+ All Categories
Home > Sports > MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Date post: 01-Nov-2014
Category:
Upload: mongodb
View: 1,747 times
Download: 3 times
Share this document with a friend
Description:
In version 2.4, MongoDB introduces hash-based sharding, a new option for distributing data in sharded collections. Hash-based sharding and range-based sharding present different advantages for MongoDB users deploying large scale systems. In this talk, we'll provide an overview of this new feature and discuss when to use hash-based sharding or range-based sharding.
Popular Tags:
64
Software Engineer, 10gen @brandonmblack Brandon Black #MongoDBDays Hash-Based Sharding in MongoDB 2.4
Transcript
Page 1: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Software Engineer, 10gen

@brandonmblack

Brandon Black

#MongoDBDays

Hash-Based Sharding in MongoDB 2.4

Page 2: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Agenda

• Mechanics of Sharding– Key space– Chunks– Balancing

• Request Routing

• Hashed Shard Keys– Why use hashed shard keys– How to enable hashed shard keys– Limitations

Page 3: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Sharded Cluster

Page 4: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Sharding Your Data

Page 5: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

What Is A Shard Key?

• Shard key is used to partition your collection

• Shard key must exist in every document

• Shard key is immutable

• Shard key values are immutable

• Shard key must be indexed

• Shard key is used to route requests to shards

Page 6: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

The Key Space

{x: 10} {x: -5} {x: -9} {x: 7} {x: 6} {x: 0}

Page 7: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Inserting Data

{x: 0}{x: 6}{x: 7}{x: -5}{x: 10} {x: -9}

Page 8: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Inserting Data

{x: 0} {x: 6}{x: 7}{x: -5} {x: 10}{x: -9}

Page 9: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Chunk Range and Size

{x: 0} {x: 6}{x: 7}{x: -5} {x: 10}{x: -9}

Page 10: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Inserting Further Data

{x: 0} {x: 6}{x: 7}{x: -5} {x: 10}{x: -9}

{x: 9}{x: -7} {x: 3}

Page 11: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Chunk Splitting

{x: 0} {x: 6}{x: 7}{x: -5} {x: 10}{x: -9}

0 0

• A chunk is split once it exceeds the maximum size• There is no split point if all documents have the same shard

key• Chunk split is a logical operation (no data is moved)• If split creates too large of a discrepancy of chunk count

across cluster a balancing round starts

Page 12: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Data Distribution

• MinKey to 0 lives on Shard1• 0 to MaxKey lives on Shard2• Mongos routes queries appropriately

Page 13: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Mongos Routes Data

minKey 0 0 maxKey

db.test.insert({ x: -1000 })

Page 14: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Mongos Routes Data

minKey 0 0 maxKey

db.test.insert({ x: -1000 })

Page 15: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Unbalanced Shards

minKey 0 0 maxKey

Page 16: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Balancing

• Migration threshold• Number of chunks less than 20, migration threshold

of 2• 21-80, migration threshold 4• >80, migration threshold 8

Page 17: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Moving the chunk

• One chunk of data is copied from Shard 1 to Shard 2

Page 18: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Committing Migration

• Once everyone agrees the data has moved, that chunk gets deleted from Shard 1.

Page 19: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Cleanup

• Other mongos' have to find out about new configuration

Page 20: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Effects of Migrations

• Expensive• Can take a long time• Competes for limited resources

Page 21: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Picking A Shard Key• Cardinality

• Optimize routing

• Minimize (unnecessary) traffic

• Allow best scaling

Page 22: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Routing Requests

Page 23: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Cluster Request Routing

• Targeted Queries

• Scatter Gather Queries

• Scatter Gather Queries with Sort

Page 24: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Cluster Request Routing: Targeted Query

Page 25: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Routable Request Received

Page 26: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Request routed to appropriate shard

Page 27: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Shard returns results

Page 28: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Mongos returns results to client

Page 29: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Cluster Request Routing: Non-Targeted Query

Page 30: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Non-Targeted Request Received

Page 31: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Request sent to all shards

Page 32: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Shards return results to mongos

Page 33: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Mongos returns results to client

Page 34: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Cluster Request Routing: Non-Targeted Query with Sort

Page 35: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Non-Targeted request with sort received

Page 36: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Request sent to all shards

Page 37: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Query and sort performed locally

Page 38: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Shards return results to mongos

Page 39: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Mongos merges sorted results

Page 40: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Mongos returns results to client

Page 41: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

What About ObjectId?

ObjectId("51597ca8e28587b86528edfd”)

• Used for _id

• 12 byte value

• Generated by the driver if not specified

• Theoretically globally unique

Page 42: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

What About ObjectId?

ObjectId("51597ca8e28587b86528edfd”)

12 Bytes

Timestamp

MAC

PID

Counter

Page 43: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

// enabling sharding on test database

mongos> sh.enableSharding("test"){ "ok" : 1 }

// sharding the test collection

mongos> sh.shardCollection("test.test",{_id:1}){ "collectionsharded" : "test.test", "ok" : 1 }

// create a loop inserting data

mongos> for (x=0; x<10000; x++) {... db.test.insert({value:x})... }

Sharding on ObjectId

Page 44: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

shards:

{ "_id" : "shard0000", "host" : "localhost:30000" }

{ "_id" : "shard0001", "host" : "localhost:30001" }

databases:

{ "_id" : "test", "partitioned" : true, "primary" : "shard0001" }

test.test

shard key: { "_id" : 1 }

chunks:

shard0001 3

{ "_id" : { "$minKey" : 1 } } -->> { "_id" : ObjectId(”...") }

on : shard0001 { "t" : 1000, "i" : 1 }

{ "_id" : ObjectId(”...”) } -->> { "_id" : { "$maxKey" : 1 } }

on : shard0001 { "t" : 1000, "i" : 2 }

ObjectId Chunk Distribution

Page 45: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

ObjectId Results In A “Hot Shard”

minKey 0 0 maxKey

Page 46: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Sharding on incremental values like timestamp is not optimum for even distribution

Page 47: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Hashed Shard Keys

Page 48: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Hashed Shard Keys

{x:2} md5 c81e728d9d4c2f636f067f89cc14862c

{x:3} md5 eccbc87e4b5ce2fe28308fd9f2a7baf3

{x:1} md5 c4ca4238a0b923820dcc509a6f75849b

Page 49: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Hashed Shard Key Eliminates “Hot Shard”

minKey 0 0 maxKey

Page 50: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Under the Hood

• Create a hashed index used for sharding

• Uses the first 64-bits of md5 hash of field

• Hash both data and BSON type

• Represented as a NumberLong in the shell

Page 51: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

// hash on 1 as an integer> db.runCommand({_hashBSONElement:1}){

"key" : 1,"seed" : 0,"out" : NumberLong("5902408780260971510"),"ok" : 1

}

// hash on “1” as a string> db.runCommand({_hashBSONElement:"1"}){

"key" : "1","seed" : 0,"out" : NumberLong("-2448670538483119681"),"ok" : 1

}

Hash on both data and BSON type

Page 52: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Enabling Hashed Indexes

• Create index:

db.collection.ensureIndex({field : ”hashed”})

Page 53: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Using Hash Shard Keys

• Enable sharding on collection:

sh.shardCollection(“test.collection”,{field: “hashed”})

Page 54: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

// enabling sharding on test database

mongos> sh.enableSharding("test"){ "ok" : 1 }

// shard by hashed _id field

mongos> sh.shardCollection("test.hash”,{_id:"hashed"}){ "collectionsharded" : "test.hash", "ok" : 1 }

Sharding on Hashed ObjectId

Page 55: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

databases:{ "_id" : "test", "partitioned" : true, "primary" : "shard0001" }

test.hash

shard key: { "_id" : "hashed" }

chunks:

shard0000 2

shard0001 2

{ "_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard0000 { "t" : 2000, "i" : 2 }

{ "_id" : NumberLong("-4611686018427387902") } --> { "_id" : NumberLong(0) } on : shard0000 { "t" : 2000, "i" : 3 }

{ "_id" : NumberLong(0) } -->> { "_id" : NumberLong("4611686018427387902") } on : shard0001 { "t" : 2000, "i" : 4 }

{ "_id" : NumberLong("4611686018427387902") } -->> { "_id" : { "$maxKey" : 1 } } on : shard0001 { "t" : 2000, "i" : 5 }

Pre-Splitting the Data

Page 56: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

// create a loop inserting data

mongos> for (x=0; x<10000; x++) {... db.hash.insert({value:x})... }

Inserting Into Hashed Shard Key Collection

Page 57: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

test.hash shard key: { "_id" : "hashed" } chunks:

shard0000 4shard0001 4

{"_id" : { "$minKey" : 1 } } -->> { "_id" : NumberLong("-7374407069602479355") } on : shard0000 { "t" : 2000, "i" : 8}

{"_id" : NumberLong("-7374407069602479355") } -->> { "_id" : NumberLong("-4611686018427387902") } on : shard0000 { "t" : 2000, "i" : 9}

{"_id" : NumberLong("-4611686018427387902") } -->> { "_id" : NumberLong("-2456929743513174890") } on : shard0000 { "t" : 2000, "i" : 6}

{"_id" : NumberLong("-2456929743513174890") } -->> { "_id" : NumberLong(0) } on : shard0000 { "t" : 2000, "i" : 7}

{ "_id" : NumberLong(0) } -->> { "_id" : NumberLong("1483539935376971743") } on : shard0001 { "t" : 2000, "i" : 12}

Even Distribution of Chunks

Page 58: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Hash Keys Are Great for Equality Queries

• Equality queries directed to a specific shard

• Will use the index

• Most efficient query possible

Page 59: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

mongos> db.hash.find({x:1}).explain(){

"cursor" : "BtreeCursor x_hashed","n" : 1,"nscanned" : 1,"nscannedObjects" : 1,"millisShardTotal" : 0,"numQueries" : 1,"numShards" : 1,"indexBounds" : {

"x" : [[

NumberLong("5902408780260971510"),

NumberLong("5902408780260971510")]

]},"millis" : 0

}

Explain Plan of an Equality Query

Page 60: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Not So Good for a Range Query

• Range queries scatter gather

• Don’t use the index

• Inefficient query

Page 61: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

mongos> db.hash.find({x:{$gt:1, $lt:99}}).explain()

{

"cursor" : "BasicCursor",

"n" : 97,

"nChunkSkips" : 0,

"nYields" : 0,

"nscanned" : 1000,

"nscannedAllPlans" : 1000,

"nscannedObjects" : 1000,

"nscannedObjectsAllPlans" : 1000,

"millisShardTotal" : 0,

"millisShardAvg" : 0,

"numQueries" : 2,

"numShards" : 2,

"millis" : 3

}

Explain Plan of a Range Query

Page 62: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Limitations

• Cannot use a compound key

• Key cannot have an array value

• Incompatible with tag aware sharding– Tags would be assigned the value of the hash, not

the value of the underlying key

• Key with poor cardinality is going to give a hash with poor cardinality

– Floating point numbers are squashed. E.g. 100.4 will be hashed as 100

Page 63: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

Summary

• There are 3 different approaches for sharding

• Hash shard keys give great distribution

• Hash shard keys are good for equality

• Pick the right shard key for your application

Page 64: MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by Brandon Black, 10gen

#MongoDBDays

Thank You

Software Engineer, 10gen

@brandonmblack

Brandon Black


Recommended