+ All Categories
Home > Software > Introduction to new high performance storage engines in mongodb 3.0

Introduction to new high performance storage engines in mongodb 3.0

Date post: 15-Jul-2015
Category:
Upload: henrik-ingo
View: 458 times
Download: 5 times
Share this document with a friend
Popular Tags:
31
Introduction to new high performance storage engines in MongoDB 2.8 Henrik Ingo Solutions Architect, MongoDB 3.0
Transcript
Page 1: Introduction to new high performance storage engines in mongodb 3.0

Introduction to new high performance

storage engines in MongoDB 2.8

Henrik Ingo

Solutions Architect, MongoDB

3.0

Page 2: Introduction to new high performance storage engines in mongodb 3.0

2

Hi, I am Henrik Ingo

@h_ingo

Page 3: Introduction to new high performance storage engines in mongodb 3.0

Introduction to new high performance

storage engines in MongoDB 2.8

Agenda:

- MongoDB and NoSQL

- Storage Engine API

- WiredTiger configuration + performance

3.0

Page 4: Introduction to new high performance storage engines in mongodb 3.0

4

Most popular NoSQL database

Page 5: Introduction to new high performance storage engines in mongodb 3.0

5

5 NoSQL categories

Key Value Wide Column Document

Graph Map Reduce

Redis, Riak Cassandra

Neo4j Hadoop

Page 6: Introduction to new high performance storage engines in mongodb 3.0

6

MongoDB is a Document Database

MongoDBRich Queries

• Find Paul’s cars

• Find everybody in London with a car

built between 1970 and 1980

Geospatial• Find all of the car owners within 5km of

Trafalgar Sq.

Text Search• Find all the cars described as having

leather seats

Aggregation• Calculate the average value of Paul’s

car collection

Map Reduce

• What is the ownership pattern of colors

by geography over time? (is purple

trending up in China?)

{

first_name: ‘Paul’,

surname: ‘Miller’,

city: ‘London’,

location:

[45.123,47.232],

cars: [

{ model: ‘Bentley’,

year: 1973,

value: 100000, … },

{ model: ‘Rolls Royce’,

year: 1965,

value: 330000, … }

}

}

Page 7: Introduction to new high performance storage engines in mongodb 3.0

7

Operational Database Landscape

Page 8: Introduction to new high performance storage engines in mongodb 3.0

MongoDB 3.0 & storage engines

Page 9: Introduction to new high performance storage engines in mongodb 3.0

9

Current state in MongoDB 2.6

Read-heavy apps

• Great performance

• B-tree

• Low overhead

• Good scale-out perf

• Secondary reads

• Sharding

Write-heavy apps

• Good scale-out perf

• Sharding

• Per-node efficiency wish-list:

• Doc level locking

• Write-optimized data

structures (LSM)

• Compression

Other

• Complex transactions

• In-memory engine

• SSD optimized engine

• etc...

Page 10: Introduction to new high performance storage engines in mongodb 3.0

10

Current state in MongoDB 2.6

Read-heavy apps

• Great performance

• B-tree

• Low overhead

• Good scale-out perf

• Secondary reads

• Sharding

Write-heavy apps

• Good scale-out perf

• Sharding

• Per-node efficiency wish-list:

• Doc level locking

• Write-optimized data

structures (LSM)

• Compression

Other

• Complex transactions

• In-memory engine

• SSD optimized engine

• etc...

How to get all of the above?

Page 11: Introduction to new high performance storage engines in mongodb 3.0

11

MongoDB 3.0 Storage Engine API

MMAP

Read-heavy app

WiredTiger

Write-heavy app

3rd party

Special app

Page 12: Introduction to new high performance storage engines in mongodb 3.0

12

MMAP

Read-heavy app

WiredTiger

Write-heavy app

3rd party

Special app

• One at a time:

– Many engines built into mongod

– Choose 1 at startup

– All data stored by the same engine

– Incompatible on-disk data formats (obviously)

– Compatible client API

• Compatible Oplog & Replication

– Same replica set can mix different engines

– No-downtime migration possible

MongoDB 3.0 Storage Engine API

Page 13: Introduction to new high performance storage engines in mongodb 3.0

13

• MMAPv1

– Improved MMAP (collection-level locking)

• WiredTiger

– Discussed next

• RocksDB

– LSM style engine developed by Facebook

– Based on LevelDB

• TokuMXse

– Fractal Tree indexing engine from Tokutek

Some existing engines

Page 14: Introduction to new high performance storage engines in mongodb 3.0

14

• Heap

– In-memory engine

• Devnull

– Write all data to /dev/null

– Based on idea from famous flash animation...

– Oplog stored as normal

• SSD optimized engine (e.g. Fusion-IO)

• KV simple key-value engine

Some rumored engines

https://github.com/mongodb/mongo/tree/master/src/mongo/db/storage

Page 15: Introduction to new high performance storage engines in mongodb 3.0

WiredTiger

Page 16: Introduction to new high performance storage engines in mongodb 3.0

16

• Modern NoSQL database engine

– flexible schema

• Advanced database engine

– Secondary indexes, MVCC, non-locking algorithms

– Multi-statement transactions (not in MongoDB 3.0)

• Very modular, tunable

– Btree, LSM and columnar indexes

– Snappy, Zlib, 3rd-party compression

– Index prefix compression, etc...

• Built by creators of BerkeleyDB

• Acquired by MongoDB in 2014

• source.wiredtiger.com

What is WiredTiger

Page 17: Introduction to new high performance storage engines in mongodb 3.0

17

Choosing WiredTiger at server startup

mongod --storageEngine wiredTiger

http://docs.mongodb.org/master/reference/program/mongod/#cmdoption--storageEngine

Page 18: Introduction to new high performance storage engines in mongodb 3.0

18

Main tunables exposed as MongoDB options

mongod --storageEngine wiredTiger--wiredTigerCacheSizeGB 8--wiredTigerDirectoryForIndexes /data/indexes--wiredTigerCollectionBlockCompressor zlib--dbpath /data/datafiles

http://docs.mongodb.org/master/reference/program/mongod/#cmdoption--storageEngine

Page 19: Introduction to new high performance storage engines in mongodb 3.0

19

All WiredTiger options via configString (hidden)

mongod --storageEngine wiredTiger--wiredTigerEngineConfigString

"cache_size=8GB,eviction=(threads_min=4,threads_max=8),checkpoint(wait=30)"

--wiredTigerCollectionConfigString"block_compressor=zlib"

--wiredTigerIndexConfigString"type=lsm,block_compressor=zlib"

--wiredTigerDirectoryForIndexes /data/indexes

See docs for wiredtiger_open() & WT_SESSION::create()http://source.wiredtiger.com/2.5.0/group__wt.html#ga9e6adae3fc6964ef837a62795c7840ed

http://source.wiredtiger.com/2.5.0/struct_w_t___s_e_s_s_i_o_n.html#a358ca4141d59c345f401c58501276bbb

Page 20: Introduction to new high performance storage engines in mongodb 3.0

20

Also via createCollection(), createIndex()

db.createCollection( "users", { storageEngine: {

wiredTiger: { configString: "block_compressor=none" }

})

http://docs.mongodb.org/master/reference/method/db.createCollection/#db.createCollection

http://docs.mongodb.org/master/reference/method/db.collection.createIndex/#db.collection.createIndex

Page 21: Introduction to new high performance storage engines in mongodb 3.0

21

• db.serverStatus()

• db.collection.stats()

More...

Page 22: Introduction to new high performance storage engines in mongodb 3.0

Understanding and Optimizing

WiredTiger

Page 23: Introduction to new high performance storage engines in mongodb 3.0

23

Understanding WiredTiger architectureW

iredT

iger

SE

Btree LSM Columnar

Cache (default: 50%)

None Snappy Zlib

OS Disk Cache (Default: 50%)

Physical disk

Page 24: Introduction to new high performance storage engines in mongodb 3.0

24

Covering 90% of your optimization needsW

iredT

iger

SE

Btree LSM Columnar

Cache (default: 50%)

None Snappy Zlib

OS Disk Cache (Default: 50%)

Physical disk

Decompression time

Disk seek time

Page 25: Introduction to new high performance storage engines in mongodb 3.0

25

Strategy 1: fit working set in CacheW

iredT

iger

SE

Btree LSM Columnar

Cache (default: 50%)

None Snappy Zlib

OS Disk Cache (Default: 50%)

Physical disk

cache_size = 80%

Page 26: Introduction to new high performance storage engines in mongodb 3.0

26

Strategy 2: fit working set in OS Disk CacheW

iredT

iger

SE

Btree LSM Columnar

Cache (default: 50%)

None Snappy Zlib

OS Disk Cache (Default: 50%)

Physical disk

cache_size = 10%

OS Disk Cache (Remaining: 90%)

Page 27: Introduction to new high performance storage engines in mongodb 3.0

27

Strategy 3: SSD disk + compression to save €W

iredT

iger

SE

Btree LSM Columnar

Cache (default: 50%)

None Snappy Zlib

OS Disk Cache (Default: 50%)

Physical diskSSD

Page 28: Introduction to new high performance storage engines in mongodb 3.0

28

Strategy 4: SSD disk (no compression)W

iredT

iger

SE

Btree LSM Columnar

Cache (default: 50%)

None Snappy Zlib

OS Disk Cache (Default: 50%)

Physical diskSSD

Page 29: Introduction to new high performance storage engines in mongodb 3.0

29

What problem is solved by LSM indexes?P

erf

orm

ance

Fast reads Fast writesBoth

Easy:

Add indexes

Easy:

No indexes

Hard:

Smart schema design (hire a consultant)

LSM index structures (or columnar)

Page 30: Introduction to new high performance storage engines in mongodb 3.0

30

2B inserts (with 3 secondary indexes)

http://smalldatum.blogspot.fi/2014/12/read-modify-write-optimized.html

Page 31: Introduction to new high performance storage engines in mongodb 3.0

Recommended