+ All Categories
Home > Technology > MongoDB Journaling and the Storage Enginer

MongoDB Journaling and the Storage Enginer

Date post: 12-May-2015
Category:
Upload: mongodb
View: 2,569 times
Download: 2 times
Share this document with a friend
Popular Tags:
22
1
Transcript
Page 1: MongoDB Journaling and the Storage Enginer

1  

Page 2: MongoDB Journaling and the Storage Enginer

Directory Layout

•  Separate files per database •  Aggressive preallocation •  Files contain one or more extents

2  

-rw------- 1 ben ben 64M May 1 19:14 test.0!-rw------- 1 ben ben 128M May 1 19:14 test.1!-rw------- 1 ben ben 256M May 1 18:25 test.2!-rw------- 1 ben ben 512M May 1 19:14 test.3!-rw------- 1 ben ben 1.0G May 1 19:14 test.4!-rw------- 1 ben ben 2.0G May 1 18:58 test.5!-rw------- 1 ben ben 16M May 1 19:14 test.ns!

Page 3: MongoDB Journaling and the Storage Enginer

Memory Mapping

STACK!…!

LIBS!

…!

test.ns!

test.0!

test.1!

…!

!…!

HEAP!

MONGOD!

NULL!

0x7fffffffffff  

0x0  

{  …  }  

Disk  

Document  Process  Virtual  Memory  

Page 4: MongoDB Journaling and the Storage Enginer

Data Structures •  DiskLoc

•  Stores file number and offset of data on disk •  Record *r = mmap base + DiskLoc.offset!•  Max offset is 2^31 (2GB)!

•  NamespaceDetails •  Stores collection metadata!

•  Extent!•  Stores contiguous blocks within a namespace •  Max extent size is 2GB  

•  Record!•  Holds a BSON document or B-tree bucket •  DeletedRecord overwrites a Record!•  Includes Padding

Page 5: MongoDB Journaling and the Storage Enginer

Namespace Details

•  Holds metadata about a collection or index •  Stored in 1KB buckets in <dbname>.ns file •  .ns file fixed size of 16MB •  Maintains document count •  Contains heads of linked lists

firstExtent   lastExtent   _indexes[]   stats   freeList[]  

NamespaceDetails  

Page 6: MongoDB Journaling and the Storage Enginer

Extent Structure

Extent  length  

xNext  

xPrev  

firstRecord  

lastRecord  

Extent  length  

xNext  

xPrev  

firstRecord  

lastRecord  

Page 7: MongoDB Journaling and the Storage Enginer

Extents

>  db.foo.validate(  {  full  :  true  }  ).extents.forEach(                      function(z){  print(  z.loc  +  "\t\t"  +  z.size  );  }  )  0:3000    20480  0:12000    81920  0:26000    327680  0:76000    1310720  0:1da000  5242880  0:76a000  6291456  0:d6a000  7553024  0:16de000  9064448  0:1f83000  10878976  0:29e3000  13058048  1:2000    15671296  1:ef4000  18808832  1:29e4000  22573056  

Page 8: MongoDB Journaling and the Storage Enginer

Index Extents

>  db.system.namespaces.find()  {  "name"  :  "test.foo"  }  {  "name"  :  "test.system.indexes"  }  {  "name"  :  "test.foo.$_id_"  }    >  db["foo.$_id_"].validate(  {  full  :  true  }  ).extents.forEach(                      function(z){  print(  z.loc  +  "\t\t"  +  z.size  );  }  )  0:9000    36864  0:1b6000  147456  0:6da000  589824  0:149e000  2359296  1:20e4000  9437184  

Page 9: MongoDB Journaling and the Storage Enginer

Extents and Records

Extent  length  

xNext  

xPrev  

firstRecord  

lastRecord  

Data  Record  

length  

rNext  

rPrev  

Document  {        _id:  “foo”,      ...    }  

Page 10: MongoDB Journaling and the Storage Enginer

Extents and Records

Extent  length  

xNext  

xPrev  

firstRecord  

lastRecord  

Data  Record  

length  

rNext  

rPrev  

Document  {        _id:  “foo”,      ...    }  

Page 11: MongoDB Journaling and the Storage Enginer

Extents and Records

Extent  length  

xNext  

xPrev  

firstRecord  

lastRecord  

Data  Record  

length  

rNext  

rPrev  

Document  {        _id:  “foo”,      ...    }  

Data  Record  

length  

rNext  

rPrev  

Document  {        _id:  “foo”,      ...    }  

Page 12: MongoDB Journaling and the Storage Enginer

BSON Format

{  hello:  “world”  }  

\x16\x00\x00\x00 \x02hello\x00 ! \x06\x00\x00\x00 world\x00\x00!

Doc  Length   Value  Type  

Value  Length  

Page 13: MongoDB Journaling and the Storage Enginer

Index Extents

Extent  length  

xNext  

xPrev  

firstRecord  

lastRecord  

Index  Record  

Bucket  parent  

numKeys  

 

length  

rNext  

rPrev  

Index  Record  

Bucket  parent  

numKeys  

K  

length  

rNext  

rPrev        

{  Document  }  

Page 14: MongoDB Journaling and the Storage Enginer

Index Extents

Extent  length  

xNext  

xPrev  

firstRecord  

lastRecord  

Index  Record  

Bucket  parent  

numKeys  

 

length  

rNext  

rPrev  

Index  Record  

Bucket  parent  

numKeys  

K  

length  

rNext  

rPrev        

{  Document  }  

4   9  

1   3   5   6   8   A   B  

Page 15: MongoDB Journaling and the Storage Enginer

Journaling

•  Write ahead logging •  Operations written to journal before memory

mapped regions •  Private view •  Shared view

•  Once journal written, data safe unless hardware problem

•  By default, journal flushed every 100ms, 100mb of writes, or on write concern of j=true •  User configurable with --journalCommitInterval

Page 16: MongoDB Journaling and the Storage Enginer

•  Section  contains  single  group  commit  •  Applied  all-­‐or-­‐nothing  

Journal Format JHeader  

JSectHeader  [LSN  3]  

DurOp  

DurOp  

DurOp  

JSectFooter  

JSectHeader  [LSN  7]  

DurOp  

DurOp  

DurOp  

JSectFooter  

…  

Op_DbContext  

length  offset  fileNo  data[length]  

length  offset  fileNo  data[length]  

length  offset  fileNo  data[length]  

Write  Operation  

Set  database  context  for  subsequent  operations  

Page 17: MongoDB Journaling and the Storage Enginer

Journal Performance

•  On 99.9% read systems, no impact •  Write performance degraded 5-30% when

journal on same drive •  Separate drive as low as 3%

Page 18: MongoDB Journaling and the Storage Enginer

Journal Admin

•  Journal stored in /dbpath/journal folder •  If faster, three 1gb files may be preallocated •  Can symlink to a different spindle •  --journalCommitInterval* (2ms - 300ms) •  When to journal

•  Single node: required for data integrity •  Replica set: at least 1 node •  All nodes: removes possible need to resync

Page 19: MongoDB Journaling and the Storage Enginer

Fragmentation

•  Files may become fragmented over time if documents change size

•  Free lists also contribute to fragmentation •  2.0 reduced scanning to reasonable amounts •  2.2 will change allocation strategy •  Need to re-write free list to do online compaction

Page 20: MongoDB Journaling and the Storage Enginer

Compaction

•  1.8 and previous: repairDatabase •  2.0+ : compact command

•  Currently resets paddingFactor, but can be changed.

•  Index (re)generation is now concurrent, so compaction can be N times faster

•  Generally causes some extra allocation •  Does not delete or truncate files

Page 21: MongoDB Journaling and the Storage Enginer

Planned Changes

•  Split data and indexes into different files •  Indexes could by symlinked to a different

drive (SSD) •  Improved allocation strategy

Page 22: MongoDB Journaling and the Storage Enginer

Download  MongoDB  

http://www.mongodb.org/downloads    

Ben  Becker  [email protected]  


Recommended