How to Achieve Scale with MongoDB

Post on 01-Nov-2014

15 views 1 download

Tags:

description

Learn how to achieve scale with MongoDB. In this presentation, we cover three different ways to scale MongoDB, including optimization, vertical scaling, and horizontal scaling.

transcript

Sr. Solutions Architect, MongoDB

Jake Angerman

How to Achieve Scale with MongoDB

Today’s Webinar Agenda

Schema Design

Indexes

Monitoring your Workload

Optimization Tips

Scale Vertically

Horizontal Scaling

Achieve Scale

1

2

3

Optimization Tips toScale Your App

Premature Optimization

• There is no doubt that the grail of efficiency leads to

abuse. Programmers waste enormous amounts of time

thinking about, or worrying about, the speed of

noncritical parts of their programs, and these attempts

at efficiency actually have a strong negative impact

when debugging and maintenance are considered. We

should forget about small efficiencies, say about 97% of

the time: premature optimization is the root of all

evil. Yet we should not pass up our opportunities in

that critical 3%.

- Donald Knuth, 1974

Premature Optimization

• "There is no doubt that the grail of efficiency leads to

abuse. Programmers waste enormous amounts of time

thinking about, or worrying about, the speed of

noncritical parts of their programs, and these attempts

at efficiency actually have a strong negative impact

when debugging and maintenance are considered. We

should forget about small efficiencies, say about 97% of

the time: premature optimization is the root of all

evil. Yet we should not pass up our opportunities in

that critical 3%."

- Donald Knuth, 1974

Premature Optimization

• "There is no doubt that the grail of efficiency leads to

abuse. Programmers waste enormous amounts of time

thinking about, or worrying about, the speed of

noncritical parts of their programs, and these attempts

at efficiency actually have a strong negative impact

when debugging and maintenance are considered. We

should forget about small efficiencies, say about 97%

of the time: premature optimization is the root of

all evil. Yet we should not pass up our opportunities in

that critical 3%."

- Donald Knuth, 1974

Schema Design

• Document Model

• Dynamic Schema

• Collections

{ "customer_id" : 123,"first_name" : ”John","last_name" : "Smith","address" : { "street": "123 Main

Street", "city": "Houston", "state": "TX", "zip_code": "77027"

}policies: [ {

policy_number : 13,description: “short

term”,deductible: 500

},{ policy_number : 14,

description: “dental”,visits: […]

} ] }

The Importance of Schema Design

• MongoDB schemas are built oppositely than relational

schemas!

• Relational Schema:– normalize data– write complex queries to join the data– let the query planner figure out how to make queries efficient

• MongoDB Schema:– denormalize the data– create a (potentially complex) schema with prior knowledge

of your actual (not just predicted) query patterns– write simple queries

Real World Example: Optimizing Schema for Scale

Product catalog schema for retailer selling in 20 countries

{_id: 375,en_US: { name: …, description: …, <etc…> },en_GB: { name: …, description: …, <etc…> },fr_FR: { name: …, description: …, <etc…> },fr_CA: { name: …, description: …, <etc…> },de_DE: …,de_CH: …,<… and so on for other locales …>

}

What's good about this schema?

• Each document contains all the data about the product across all possible locales.

• It is the most efficient way to retrieve all translations of a product in a single query (English, French, German, etc).

But that's not how the data was accessed

db.catalog.find( { _id: 375 }, { en_US:

true } );

db.catalog.find( { _id: 375 }, { fr_FR:

true } );

db.catalog.find( { _id: 375 }, { de_DE:

true } );

… and so forth for other locales

The data model did not fit the access pattern.

Why is this inefficient?

Data in RED are

being used. Data

in BLUE take up

memory but are

not in demand.

{_id: 375,en_US: { name: …, description: …, <etc…> },en_GB: { name: …, description: …, <etc…> },fr_FR: { name: …, description: …, <etc…> },fr_CA: { name: …, description: …, <etc…> },de_DE: …,de_CH: …,<… and so on for other locales …>

}

{_id: 42,en_US: { name: …, description: …, <etc…> },en_GB: { name: …, description: …, <etc…> },fr_FR: { name: …, description: …, <etc…> },fr_CA: { name: …, description: …, <etc…> },de_DE: …,de_CH: …,<… and so on for other locales …>

}

Consequences of the schema

• Each document contained 20x more data than the common use case requires

• Disk IO was too high for the relatively modest query load on the dataset

• MongoDB lets you request a subset of a document's contents via projection…

• … but the entire document must be loaded into RAM to service the request

Consequences of the schema redesign

{_id: "375-en_GB",name: …,description: …, <… the rest of the document …>

}

• Queries induced minimal memory overhead

• 20x as many distinct products fit in RAM at once

• Disk IO utilization reduced

• Application latency reduced

Schema Design Patterns

• Pattern: pre-computing interesting quantities, ideally

with each write operation

• Pattern: putting unrelated items in different collections

to take advantage of indexing

• Anti-pattern: appending to arrays ad infinitum

• Anti-pattern: importing relational schemas directly into

MongoDB

Schema Design Tips

• Avoid inherently slow operations– Updates of unindexed arrays of several thousand elements– Updates of indexed arrays of several hundred elements– Document moves

• Arrays are great, but know how to use them

Schema Design resources

• Blog series, "6 rules of thumb"– Part 1: http://goo.gl/TFJ3dr– Part 2: http://goo.gl/qTdGhP– Part 3: http://goo.gl/JFO1pI

Indexing

• Indexes are tree-structured sets of references to your

documents

• Indexes are the single biggest tunable performance

factor in the database

• Indexing and schema design go hand in hand

Indexing Mistakes

• Failing to build necessary indexes

• Building unnecessary indexes

• Running ad-hoc queries in production

Indexing Fixes

• Failing to build necessary indexes– Run .explain(), examine slow query log, mtools,

system.profile collection

• Building unnecessary indexes– Talk to your application developers about usage

• Running ad-hoc queries in production– Use a staging environment, use secondaries

mongod log files

Sun Jun 29 06:35:37.646 [conn2] query test.docs query: { parent.company: "22794", parent.employeeId: "83881" } ntoreturn:1 ntoskip:0 nscanned:806381 keyUpdates:0 numYields: 5 locks(micros) r:2145254 nreturned:0 reslen:20 1156ms

mongod log files

Sun Jun 29 06:35:37.646 [conn2] query test.docs query: { parent.company: "22794", parent.employeeId: "83881" } ntoreturn:1 ntoskip:0 nscanned:806381 keyUpdates:0 numYields: 5 locks(micros) r:2145254 nreturned:0 reslen:20 1156ms

date and time

threadoperatio

n

nam

esp

ace

n…

counte

rs

locktimes

duration

number of yields

You need a tool when doing log file analysis

mtools

• http://github.com/rueckstiess/mtools

• log file analysis for poorly performing queries– Show me queries that took more than 1000 ms from 6 am to

6 pm:– mlogfilter mongodb.log --from 06:00 --to 18:00 --slow 1000 > mongodb-filtered.log

Graphing with mtools

% mplotqueries --type histogram --group namespace --bucketSize 3600

Real World Example: Indexing for Scale

Sun Jun 29 06:35:37.646 [conn2] query test.docs query: { parent.company: "22794", parent.employeeId: "83881" } ntoreturn:1 ntoskip:0 nscanned:806381 keyUpdates:0 numYields: 5 locks(micros) r:2145254 nreturned:0 reslen:20 1156ms

Document schema

{

_id: ObjectId("53b9ab7e939f1e229b4f574c"),

firstName: "Alice",

lastName: "Smith",

parent: {

company: 22794,

employeeId: 83881

}

}

But there's an index!?!

db.system.indexes.find().toArray()

[{

"v" : 1,

"key" : {

"company" : 1,

"employeeId" : 1

},

"ns" : "test.docs",

"name" : "company_1_employeeId_1"

}]

But there's an index!?!

db.system.indexes.find().toArray()

[{

"v" : 1,

"key" : {

"company" : 1,

"employeeId" : 1

},

"ns" : "test.docs",

"name" : "company_1_employeeId_1"

}]

This isn't the index

you're looking

for.

Did you see the problem?

{

_id: ObjectId("53b9ab7e939f1e229b4f574c"),

firstName: "Alice",

lastName: "Smith",

parent: {

company: 22794,

employeeId: 83881

}

}

The index was created incorrectly

db.system.indexes.find().toArray()

[{

"v" : 1,

"key" : {

"parent.company" : 1,

"parent.employeeId" : 1

},

"ns" : "test.docs",

"name" :

"parent.company_1_parent.employeeId_1"

}]

Subdocument needed

Indexing Strategies

• Create indexes that support your queries!

• Create highly selective indexes

• Eliminate duplicate indexes with a compound index, if

possible– db.collection.ensureIndex({A:1, B:1, C:1})– allows queries using leftmost prefix

• Order compound index fields thusly: equality, sort,

then range– see http://emptysqua.re/blog/optimizing-mongodb-

compound-indexes/

• Create indexes that support covered queries

• Prevent collection scans in pre-production

environments– mongod --notablescan– db.getSiblingDB("admin").runCommand( { setParameter: 1, notablescan: 1 } )

Monitoring Your Workload

• Log files, iostat, mtools, mongotop are for debugging

• MongoDB Management Service (MMS) can do

metrics collection and reporting

What can MMS do?

Database Metrics

Hardware statistics (CPU, disk)

MMS Monitoring Setup

Cloud Version of MMS

1. Go to http://mms.mongodb.com

2. Create an account

3. Install one agent in your datacenter

4. Add hosts from the web interface

5. Enjoy!

Today’s Webinar Agenda

Hardware ConsiderationsScale Vertically

Horizontal Scaling

Achieve Scale

2

3

Optimization Tips

1

Vertical Scaling

Factors:– RAM– Disk– CPU– Network

Primary

Secondary

Secondary

Replica SetPrimary

Secondary

Secondary

Replica Set

Horizontal Scaling

Working Set Exceeds Physical Memory

RAM - Measure your working set and index sizes

• db.serverStatus({workingSet:1}).workingSet{ "computationTimeMicros": 2751, "note": "thisIsAnEstimate", "overSeconds": 1084, "pagesInMemory": 2041}

• db.stats().indexSize2032880640

• In this example,

(2041 * 4096) + 2032880640 =

2041240576 bytes

= 1.9 GB

• Note: this is a subset of the virtual memory used by

mongod

Real World Example: Vertical Scaling

• System that tracked status information for entities in

the business

• State changes happen in batches; sometimes 10% of

entities get updated, sometimes 100% get updated

Initial Architecture

Sharded cluster with 4 shards using spinning disks

Application / mongosmongod

Adding shards to scale horizontally

• Application was a success! Business entities grew by a

factor of 5

• Cluster capacity multiplied by 5, but so did the TCOApplication / mongos

…16 more shards…

mongod

More success means more shards

• 10x growth means … 200 shards

• Horizontal scaling with sharding is linear scaling, but

an order of magnitude was needed

• Bulk updates of random documents approaches

speed of disks

Final architecture

• Scaling the random IOPS with SSDs was a vertical

scaling approach

Application / mongosmongod SSD

Before you add hardware…

• Make sure you are solving the right scaling problem

• Remedy schema and index problems first– schema and index problems can look like hardware problems

• Tune the Operating System– ulimits, swap, NUMA, NOOP scheduler with hypervisors

• Tune the IO subsystem– ext4 or XFS vs SAN, RAID10, readahead, noatime

• See MongoDB "production notes" page

• Heed logfile startup warnings

Today’s Webinar Agenda

The Basics of ShardingHorizontal

Scaling

Achieve Scale

3

Optimization Tips

1

Scale Vertically2

The basics ofHorizontal Scaling

The basics ofHorizontal Scaling(aka Sharding)

The Basics of Sharding

Rule of Thumb

To make good decisions about MongoDB implementations, you must understand MongoDB and

your applications and the workload your applications generate and your business

requirements.

Summary

• Don't throw hardware at the problem until you

examine all other possibilities (schema, indexes, OS,

IO subsystem)

• Know what is considered "normal" performance by

monitoring

• Horizontal scaling in MongoDB is implemented with

sharding, but you must understand schema design

and indexing before you shard

Sharding a sub-optimally designed database will not

make it performant

Today’s Webinar Agenda

The Basics of ShardingHorizontal

Scaling

Achieve Scale

3

Schema Design

Indexes

Monitoring your Workload

Scale Vertically2

Optimization Tips

1

Limited Time: Get Expert Advice for Free

If you’re thinking about scaling, why reinvent the wheel?

Our experts can collaborate with you to provide detailed guidance.

Sign Up For a Free One Hour Consult:

http://bit.ly/1rkXcfN

Questions?

Stay tuned after the webinar and take our survey for your chance to win MongoDB schwag.

Sr. Solutions Architect, MongoDB

Jake Angerman

Thank You