Date post: | 08-Jul-2015 |
Category: |
Technology |
Upload: | mongodb |
View: | 11,556 times |
Download: | 0 times |
MongoDBIndexing and Query Optimizer
Details
Aaron Staple
MongoSV
December 3, 2010
What will we cover?
• Many details of how indexing and the query optimizer work
• A full understanding of these details is not required to use mongo, but this knowledge can be helpful when making optimizations.
• We’ll discuss functionality of Mongo 1.8 (for our purposes pretty similar to 1.6 and almost identical to 1.7 edge).
• Much of the material will be presented through examples.
• Diagrams are to aid understanding – some details will be left out.
What will we cover?
• Basic index bounds
• Compound key index bounds
• Or queries
• Automatic index selection
How will we cover it?
• We’re going to try and cover this material interactively - please volunteer your thoughts on what mongo should do in given scenarios when I ask.
• Pertinent questions are welcome, but please keep off topic or specialized questions until the end so we don’t lose momentum.
Btree (just a conceptual diagram)
1
2
3 4
5
6
7
8 9
{_id:4,x:6}
Basic Index Bounds
Find One Document
• db.c.find( {x:6} ).limit( 1 )
• Index {x:1}
Find One Document
1 2 3 4 5 6 7 8 9
6 ?
{_id:4,x:6}
Find One Document>db.c.find( {x:6} ).limit( 1 ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
6,
6
]
]
}
}
Find One Document
"indexBounds" : {
"x" : [
[
6,
6
]
]
}
Find One Document
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
Find One Document
1
2
3 4
5
6
7
8 9
6 ?
{_id:4,x:6}
Find One Document
1
2
3 4
5
6
7
8 9
6 ?
{_id:4,x:6}
Find One Document
1 2 3 4 5 6 6 6 9
6 ?
{_id:4,x:6}
Now we have
duplicate x values
Find One Document
1
2
3 4
5
6
6
6 9
6 ?
{_id:4,x:6}
Equality Match
• db.c.find( {x:6} )
• Index {x:1}
9
Equality Match
1 2 3 4 5 6 6 6
6 ?
{_id:4,x:6} {_id:5,x:6}
{_id:1,x:6}
Equality Match>db.c.find( {x:6} ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 3,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
6,
6
]
]
}
}
Equality Match
"indexBounds" : {
"x" : [
[
6,
6
]
]
}
Equality Match
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 3,
Equality Match
1
2
3 4
5
6
6
6 9
6 ?
Full Document Matcher
• db.c.find( {x:6,y:1} )
• Index {x:1}
9
Full Document Matcher
1 2 3 4 5 6 6 6
6 ?
{y:4,x:6} {y:5,x:6}
{y:1,x:6}
Full Document Matcher>db.c.find( {x:6,y:1} ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 1,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
6,
6
]
]
}
}
Full Document Matcher
"indexBounds" : {
"x" : [
[
6,
6
]
]
}
Full Document Matcher
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 1, Documents for all
matching keys
scanned, but only
one document
matched on non
index keys.
Range Match
• db.c.find( {x:{$gte:4,$lte:7}} )
• Index {x:1}
8
Range Match
1 2 3 4 5 6 7 9
4 <= ? <= 7
Range Match>db.c.find( {x:{$gte:4,$lte:7}} ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 4,
"nscannedObjects" : 4,
"n" : 4,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
4,
7
]
]
}
}
Range Match
"indexBounds" : {
"x" : [
[
4,
7
]
]
Range Match
"nscanned" : 4,
"nscannedObjects" : 4,
"n" : 4,
Range Match
1
2
3 4
5
6
7
8 9
Exclusive Range Match
• db.c.find( {x:{$gt:4,$lt:7}} )
• Index {x:1}
8
Exclusive Range Match
1 2 3 4 5 6 7 9
4 < ? < 7
Exclusive Range Match>db.c.find( {x:{$gt:4,$lt:7}} ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
4,
7
]
]
}
}
Exclusive Range Match
"indexBounds" : {
"x" : [
[
4,
7
]
]
}
Explain doesn’t
indicate that
the range is
exclusive.
Exclusive Range Match
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2, But index keys
matching the
range bounds are
not scanned
because the
bounds are
exclusive.
Exclusive Range Match
1
2
3 4
5
6
7
8 9
Multikeys
• db.c.find( {x:{$gt:7}} )
• Index {x:1}
Multikeys
1 2 3 4 5 6 7 9
? > 7
{_id:4,x:[8,9]}
8
Multikeys>db.c.find( {x:{$gt:7}} ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 1,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
7,
1.7976931348623157e+308
]
]
}
}
Multikeys
"indexBounds" : {
"x" : [
[
7,
1.7976931348623157e+308
]
]
}
Multikeys
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 1, All keys in valid
range are
scanned, but the
matcher rejects
duplicate
documents making
n == 1.
Multikeys
1
2
3 4
5
6
7
8 9
Range Types
• Explicit inequality
• db.c.find( {x:{$gt:4,$lt:7}} )
• db.c.find( {x:{$gt:4}} )
• db.c.find( {x:{$ne:4}} )
• Regular expression prefix
• db.c.find( {x:/^a/} )
• Data type
• db.c.find( {x:/a/} )
Range Types
db.c.find( {x:{$gt:4,$lt:7}} )
"indexBounds" : {
"x" : [
[
4,
7
]
]
}
Range Types
db.c.find( {x:{$gt:4}} )
"indexBounds" : {
"x" : [
[
4,
1.7976931348623157e+308
]
]
}
Range Types
db.c.find( {x:{$ne:4}} )
"indexBounds" : {
"x" : [
[
{
"$minElement" : 1
},
4
],
[
4,
{
"$maxElement" : 1
}
]
]
}
Range Types
db.c.find( {x:/^a/} )
"indexBounds" : {"x" : [
["a","b"
],[
/^a/,/^a/
]]
}
Range Types
db.c.find( {x:/a/} )
"indexBounds" : {
"x" : [
[
"",
{
}
],
[
/a/,
/a/
]
]
}
Set Match
• db.c.find( {x:{$in:[3,6]}} )
• Index {x:1}
8
Set Match
1 2 3 4 5 6 7 9
3 , 6
Set Match>db.c.find( {x:{$in:[3,6]}} ).explain()
{
"cursor" : "BtreeCursor x_1 multi",
"nscanned" : 3,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 8,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
3,
3
],
[
6,
6
]
]
}
}
Set Match
"indexBounds" : {
"x" : [
[
3,
3
],
[
6,
6
]
]
}
Set Match
"nscanned" : 3,
"nscannedObjects" : 2,
"n" : 2, Why is nscanned 3?
This is an
algorithmic detail
we’ll discuss more
later, but when there
are disjoint ranges
for a key nscanned
may be higher than
the number of
matching keys.
Set Match
1
2
3 4
5
6
7
8 9
All Match
• db.c.find( {x:{$all:[3,6]}} )
• Index {x:1}
8
All Match
1 2 3 4 5 6 7 9
3 ?
{_id:4,x:[3,6]}
All Match>db.c.find( {x:{$all:[3,6]}} ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
3,
3
]
]
}
}
All Match
"indexBounds" : {
"x" : [
[
3,
3
]
]
}
The first entry in the
$all match array is
always used for
index bounds. Note
this may not be the
least numerous
indexed value in the
$all array.
All Match
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
All Match
1
2
3 4
5
6
7
8 9
Limit
• db.c.find( {x:{$lt:6},y:3} ).limit( 3 )
• Index {x:1}
8
Limit
1 2 3 4 5 6 7 9
6? <
y:3 y:1 y:3 y:3 y:3
Limit>db.c.find( {x:{$lt:6},y:3} ).limit( 3 ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 4,
"nscannedObjects" : 4,
"n" : 3,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
-1.7976931348623157e+308,
6
]
]
}
}
Limit
"indexBounds" : {
"x" : [
[
-1.7976931348623157e+308,
6
]
]
}
Limit
"nscanned" : 4,
"nscannedObjects" : 4,
"n" : 3, Scan until three
matches are found,
then stop.
Skip
• db.c.find( {x:{$lt:6},y:3} ).skip( 3 )
• Index {x:1}
8
Skip
1 2 3 4 5 6 7 9
6? <
y:3 y:1 y:3 y:3 y:3
Skip>db.c.find( {x:{$lt:6},y:3} ).skip( 3 ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 5,
"nscannedObjects" : 5,
"n" : 1,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
-1.7976931348623157e+308,
6
]
]
}
}
Skip
"indexBounds" : {
"x" : [
[
-1.7976931348623157e+308,
6
]
]
}
Skip
"nscanned" : 5,
"nscannedObjects" : 5,
"n" : 1, All skipped
documents are
scanned.
Sort
• db.c.find( {x:{$lt:6}} ).sort( {x:1} )
• Index {x:1}
8
Sort
1 2 3 4 5 6 7 9
6? <
y:3 y:1 y:3 y:3 y:3
Sort>db.c.find( {x:{$lt:6},y:3} ).sort( {x:1} ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 5,
"nscannedObjects" : 5,
"n" : 4,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
-1.7976931348623157e+308,
6
]
]
}
}
Sort
"cursor" : "BtreeCursor x_1",
Sort
• db.c.find( {x:{$lt:6}} ).sort( {y:1} )
• Index {x:1}
8
Sort
1 2 3 4 5 6 7 9
6? <
y:3 y:1 y:3 y:3 y:3
Sort>db.c.find( {x:{$lt:6},y:3} ).sort( {y:1} ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 5,
"nscannedObjects" : 5,
"n" : 4,
"scanAndOrder" : true,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
-1.7976931348623157e+308,
6
]
]
}
}
Sort
"cursor" : "BtreeCursor x_1",
"nscanned" : 5,
"nscannedObjects" : 5,
"n" : 4,
"scanAndOrder" : true,Results are sorted
on the fly to match
requested order.
The scanAndOrder
field is only printed
when its value is
true.
Sort and scanAndOrder
• With “scanAndOrder” sort, all documents must be touched even if there is a limit spec.
• With scanAndOrder, sorting is performed in memory and the memory footprint is constrained by the limit spec if present.
Count
• db.c.count( {x:{$gte:4,$lte:7}} )
• Index {x:1}
8
Count
1 2 3 4 5 6 7 9
4 <= ? <= 7
Count
1
2
3 4
5
6
7
8 9
We’re just counting
keys here, not
loading the full
documents.
Count
• With some operators the full document must be checked. Some of these cases:• $all
• $size
• array match
• Negation - $ne, $nin, $not, etc.• With current semantics, all multikey elements must match
negation constraints
• Multikey de duplication works without loading full document
Covered Indexes
• db.c.find( {x:6}, {x:1,_id:0} )
• Index {x:1} Id would be returned
by default, but isn’t
in the index so we
need to exclude to
return only indexed
fields.
8
Covered Indexes
1 2 3 4 5 6 7 9
6 ?
{_id:4,x:6}
Covered Indexes>db.c.find( {x:6}, {x:1,_id:0} ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : true,
"indexBounds" : {
"x" : [
[
6,
6
]
]
}
}
Covered Indexes
"isMultiKey" : false,
"indexOnly" : true,
8
Covered Indexes
1 2 3 4 5 6 7 9
6 ?
{_id:4,x:[6,7]}
Covered Indexes>db.c.find( {x:6}, {x:1,_id:0} ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
6,
6
]
]
}
}
Covered Indexes
"isMultiKey" : true,
"indexOnly" : false, Currently we set
isMultiKey to true the
first time we save a doc
where the field is a
multikey array. But
when all multikey docs
are removed we don’t
reset isMultiKey. This
can be improved.
Update
• db.c.find( {x:{$gte:4,$lte:7}}, {$set:{x:2}} )
• Index {x:1}
8
Update
1 2 3 4 5 6 7 9
4 <= ? <= 7
{_id:4,x:4}
Update
1
2
3 4
5
6
7
8 9
{_id:4,x:4}
Update
1
2
3 4
5
6
7
8 9
{_id:4,x:4}
Update
1
2
2 3
5
6
7
8 9
{_id:4,x:2}
Update
• We track the set of documents that have been updated in the course of the current operation so they are only updated once.
Compound Key Index Bounds
Two Equality Bounds
• db.c.find( ,x:5,y:’c’- )
• Index {x:1,y:1}
Two Equality Bounds
?5c
1b
3d
4g
5d
5f
6c
7a
9b
5c
Two Equality Bounds>db.c.find( {x:5,y:'c'} ).explain()
{
"cursor" : "BtreeCursor x_1_y_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
5,
5
]
],
"y" : [
[
"c",
"c"
]
]
}
}
Two Equality Bounds"indexBounds" : {
"x" : [
[
5,
5
]
],
"y" : [
[
"c",
"c"
]
]
}
}
Two Equality Bounds
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
Two Equality Bounds
?
1b
3d
4g
5c
5d
5f
5c
6c
7a
9b
Equality and Set
• db.c.find( ,x:5,y:,$in:*’c’,’f’+-- )
• Index {x:1,y:1}
Equality and Set
,5c
1b
3d
4g
5d
5f
6c
7a
9b
5c
5f
Equality and Set>db.c.find( {x:5,y:{$in:['c','f']}} ).explain()
{
"cursor" : "BtreeCursor x_1_y_1 multi",
"nscanned" : 3,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
5,
5
]
],
"y" : [
[
"c",
"c"
],
[
"f",
"f"
]
]
}
}
Equality and Set"indexBounds" : {
"x" : [
[
5,
5
]
],
"y" : [
[
"c",
"c"
],
[
"f",
"f"
]
]
}
Equality and Set
"nscanned" : 3,
"nscannedObjects" : 2,
"n" : 2,
Equality and Set
1b
3d
4g
5c
5d
5f
6c
7a
9b
Equality and Range
• db.c.find( ,x:5,y:,$gte:’d’-- )
• Index {x:1,y:1}
Equality and Range
1b
3d
4g
5d
5f
6c
7a
9b
5c
<= ? <= 5d
5max string
Equality and Range>db.c.find( {x:5,y:{$gte:'d'}} ).explain()
{
"cursor" : "BtreeCursor x_1_y_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
5,
5
]
],
"y" : [
[
"d",
{
}
]
]
}
}
Equality and Range"indexBounds" : {
"x" : [
[
5,
5
]
],
"y" : [
[
"d",
{
}
]
]
}
Equality and Range
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
Equality and Range
1b
3d
4g
5c
5d
5f
6c
7a
9b
Two Set Bounds
• db.c.find( ,x:,$in:*5,9+-,y:,$in:*’c’,’f’+-- )
• Index {x:1,y:1}
Two Set Bounds
,5c
1b
3d
4g
5d
5f
6c
7a
9f
5c
5f ,9
c9
f,
Two Set Bounds>db.c.find( {x:{$in:[5,9]},y:{$in:['c','f']}} ).explain()
{
"cursor" : "BtreeCursor x_1_y_1 multi",
"nscanned" : 5,
"nscannedObjects" : 3,
"n" : 3,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
5,
5
],
[
9,
9
]
],
"y" : [
[
"c",
"c"
],
[
"f",
"f"
]
]
}
Two Set Bounds"indexBounds" : {
"x" : [
[
5,
5
],
[
9,
9
]
],
"y" : [
[
"c",
"c"
],
[
"f",
"f"
]
]
}
Two Set Bounds
"nscanned" : 5,
"nscannedObjects" : 3,
"n" : 3,
Two Set Bounds
1b
3d
4g
5c
5d
5f
6c
7a
9f
Set and Range
• db.c.find( ,x:,$in:*5,9+-,y:,$lte:’d’-- )
• Index {x:1,y:1}
Set and Range
<=?<=5min
string
1b
3d
4g
5d
5f
6c
9a
9f
5c
5d
9d, 9
minstring
<=?<=
Set and Range>db.c.find( {x:{$in:[5,9]},y:{$lte:'d'}} ).explain()
{
"cursor" : "BtreeCursor x_1_y_1 multi",
"nscanned" : 5,
"nscannedObjects" : 3,
"n" : 3,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
5,
5
],
[
9,
9
]
],
"y" : [
[
"",
"d"
]
]
}
}
Set and Range"x" : [
[
5,
5
],
[
9,
9
]
],
"y" : [
[
"",
"d"
]
]
}
Set and Range
"nscanned" : 5,
"nscannedObjects" : 3,
"n" : 3,
Range and Equality
• db.c.find( ,x:,$gte:4-,y:’c’- )
• Index {x:1,y:1}
Range and Equality
? >=4
1b
3d
4g
5d
6a
7e
9f
5c
cand ?
8c
Range and Equality>db.c.find( {x:{$gte:4},y:'c'} ).explain()
{
"cursor" : "BtreeCursor x_1_y_1",
"nscanned" : 7,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
4,
1.7976931348623157e+308
]
],
"y" : [
[
"c",
"c"
]
]
}
}
Range and Equality"indexBounds" : {
"x" : [
[
4,
1.7976931348623157e+308
]
],
"y" : [
[
"c",
"c"
]
]
}
Range and Equality
"nscanned" : 7,
"nscannedObjects" : 2,
"n" : 2, High nscanned
because every
distinct value of x
must be checked.
Range and Equality
1b
3d
4g
5c
5d
9f
6a
7e
8c
Range and Equality
1b
3d
4g
5c
5d
9f
6a
7e
8c
Every distinct value
of x must be
checked.
Range and Set
• db.c.find( ,x:,$gte:4-,y:,$in:*’c’,’a’+-- )
• Index {x:1,y:1}
Range and Set
? >=4
1b
3d
4g
5d
6a
7e
9f
5c
cand ,
8c
a
Range and Set>db.c.find( {x:{$gte:4},y:{$in:['c','a']}} ).explain()
{
"cursor" : "BtreeCursor x_1_y_1 multi",
"nscanned" : 7,
"nscannedObjects" : 3,
"n" : 3,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
4,
1.7976931348623157e+308
]
],
"y" : [
[
"a",
"a"
],
[
"c",
"c"
]
]
}
}
Range and Set"indexBounds" : {
"x" : [
[
4,
1.7976931348623157e+308
]
],
"y" : [
[
"a",
"a"
],
[
"c",
"c"
]
]
}
Range and Set
"nscanned" : 7,
"nscannedObjects" : 3,
"n" : 3,
Range and Set
1b
3d
4g
5c
5d
9f
6a
7e
8c
Range and Set
1b
3d
4g
5c
5d
9f
6a
7e
8c
Every distinct value
of x must be
checked for y values
‘a’ and ‘c’.
Two Ranges (2D Box)
• db.c.find( ,x:,$gte:3,$lte:7-,y:,$gte:’c’,$lte:’f’-- )
• Index {x:1,y:1}
Two Ranges (2D Box)
x
y
3 7
c
f
{x:{$gte:3,$lte:7},
y:,$gte:’c’,$lte:’f’--
Two Ranges (2D Box)
<=?<=7
1b
3d
4g
5d
6a
7e
9f
5c
c&
7g
f3 <=?<=
Two Ranges (2D Box)>db.c.find( {x:{$gte:3,$lte:7},y:{$gte:'c',$lte:'f'}} ).explain()
{
"cursor" : "BtreeCursor x_1_y_1",
"nscanned" : 6,
"nscannedObjects" : 4,
"n" : 4,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
3,
7
]
],
"y" : [
[
"c",
"f"
]
]
}
}
Two Ranges (2D Box)"indexBounds" : {
"x" : [
[
3,
7
]
],
"y" : [
[
"c",
"f"
]
]
}
Two Ranges (2D Box)
"nscanned" : 6,
"nscannedObjects" : 4,
"n" : 4,
Two Ranges (2D Box)
1b
3d
4g
5c
5d
9f
6a
7e
7g
Two Ranges (2D Box)
<=?<=7
c f
3
<=?<=
For every distinct value of x in this range
Scan for every value of y in this range
$or
Disjoint $or Criteria
• db.c.find( ,$or:*,x:5-,,y:’d’-+- )
• Indexes {x:1}, {y:1}
Disjoint $or Criteria
?
1b
3d
4g
5d
6a
7e
9f
5c
d
7g
5
?
1b
3d
4g
5d
6a
7e
9f
5c
7g
Disjoint $or Criteria>db.c.find( {$or:[{x:5},{y:'d'}]} ).explain()
{
"clauses" : [
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
5,
5
]
]
}
},
{
"cursor" : "BtreeCursor y_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 1,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"y" : [
[
"d",
"d"
]
]
}
}
],
"nscanned" : 4,
"nscannedObjects" : 4,
"n" : 3,
"millis" : 1
}
Disjoint $or Criteria{
"cursor" : "BtreeCursor x_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 2,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
5,
5
]
]
}
},
Disjoint $or Criteria{
"cursor" : "BtreeCursor y_1",
"nscanned" : 2,
"nscannedObjects" : 2,
"n" : 1,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"y" : [
[
"d",
"d"
]
]
}
}
Only return one
document matching
this clause.
Disjoint $or Criteria
"nscanned" : 4,
"nscannedObjects" : 4,
"n" : 3,
"millis" : 1
Disjoint $or Criteria
?
1b
3d
4g
5d
6a
7e
9f
5c
7g
5
✓
Disjoint $or Criteria
d ?
1b
3d
4g
5d
6a
7e
9f
5c
7g
We have already
scanned the x index
for x:5. So this
document was
returned already. We
don’t return it again.
✓
Unindexed $or Clause
• db.c.find( ,$or:*,x:5-,,y:’d’-+- )
• Index {x:1} (no index on y)
Unindexed $or Clause
>db.c.find( {$or:[{x:5},{y:'d'}]} ).explain()
{
"cursor" : "BasicCursor",
"nscanned" : 9,
"nscannedObjects" : 9,
"n" : 3,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
Since y is not indexed,
we must do a full
collection scan to
match y:’d’. Since a
full scan is required,
we don’t use the index
on x to match x:5.
Eliminated $or Clause
• db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:5}]} )
• Index {x:1}
Eliminated $or Clause
81 2 3 4 6 7 95
2 < ? < 6
81 2 3 4 6 7 95
5 ?
Eliminated $or Clause
>db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:5}]} ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 3,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
2,
6
]
]
}
}
The index range of the
second clause is
included in the index
range of the first
clause, so we use the
first index range only.
Eliminated $or Clause with Differing Unindexed Criteria
• db.c.find( ,$or:*,x:,$gt:2,$lt:6-,y:’c’-,,x:5,y:'d’-+- )
• Index {x:1}
Eliminated $or Clause with Differing Unindexed Criteria
1b
3d
4g
5d
6a
7e
9f
5c
7g
< ? <2 6 and c
1b
3d
4g
5d
6a
7e
9f
5c
7g
5 and d
Eliminated $or Clause with Differing Unindexed Criteria
>db.c.find( ,$or:*,x:,$gt:2,$lt:6-,y:’c’-,,x:5,y:'d’-+- ).explain()
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 4,
"nscannedObjects" : 4,
"n" : 2,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
2,
6
]
]
}
}
Eliminated $or Clause with Differing Unindexed Criteria
1b
3d
4g
5d
6a
7e
9f
5c
7g
< ? <2 6 and c , d
The index range for the first clause contains the index
range for the second clause, so all matching is done
using the index range for the first clause.
Overlapping $or Clauses
• db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:{$gt:4,$lt:7}}]} )
• Index {x:1,y:1}
Overlapping $or Clauses
81 2 3 4 6 7 95
2 < ? < 6
81 2 3 4 6 7 95
4 < ? < 7
Overlapping $or Clauses>db.d.find( {$or:[{x:{$gt:2,$lt:6}},{x:{$gt:4,$lt:7}}]} ).explain()
{
"clauses" : [
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 3,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
2,
6
]
]
}
},
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
6,
7
]
]
}
}
],
"nscanned" : 4,
"nscannedObjects" : 4,
"n" : 4,
"millis" : 1
}
>
Overlapping $or Clauses
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 3,
"nscannedObjects" : 3,
"n" : 3,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
2,
6
]
]
}
},
Overlapping $or Clauses
{
"cursor" : "BtreeCursor x_1",
"nscanned" : 1,
"nscannedObjects" : 1,
"n" : 1,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
6,
7
]
]
}
}
The index range
scanned for the
previous clause is
removed.
Overlapping $or Clauses
81 2 3 4 6 7 95
2 < ? < 6
81 2 3 4 7 95
6 <= ? < 7
6
2D Overlapping $or Clauses
• db.c.find( ,$or:*,x:,$gt:2,$lt:6-,y:,$gt:’b’,$lt:’f’--,,x:,$gt:4,$lt:7-,y:,$gt:’b’,$lt:’e’--+- )
• Index {x:1,y:1}
2D Overlapping $or Clauses
x
y
2 6
b
f
Clause 2
e
7
Clause 1
2D Overlapping $or Clauses>db.c.find( {$or:[{x:{$gt:2,$lt:6},y:{$gt:'b',$lt:'f'}},{x:{$gt:4,$lt:7},y:{$gt:'b',$lt:'e'}}]} ).explain()
{
"clauses" : [
{
"cursor" : "BtreeCursor x_1_y_1",
"nscanned" : 4,
"nscannedObjects" : 3,
"n" : 3,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
2,
6
]
],
"y" : [
[
"b",
"f"
]
]
}
},
{
"cursor" : "BtreeCursor x_1_y_1",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
6,
7
]
],
"y" : [
[
"b",
"e"
]
]
}
}
],
"nscanned" : 4,
"nscannedObjects" : 3,
"n" : 3,
"millis" : 1
2D Overlapping $or Clauses
{
"cursor" : "BtreeCursor x_1_y_1",
"nscanned" : 4,
"nscannedObjects" : 3,
"n" : 3,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
2,
6
]
],
"y" : [
[
"b",
"f"
]
]
}
2D Overlapping $or Clauses
{
"cursor" : "BtreeCursor x_1_y_1",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 1,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"x" : [
[
6,
7
]
],
"y" : [
[
"b",
"e"
]
]
}
}
],
The index range
scanned for the
previous clause is
removed.
2D Overlapping $or Clauses
x
y
2 6
b
f
Clause 2
e
7
We only have
to scan the
remainder
here
Clause 1
Overlapping $or Clauses
• Rule of thumb for n dimensions: We subtract earlier clause boxes from current box when the result is a/some box(es).
2✓
2✓
✓
11
Overlapping $or Clauses
• Rule of thumb for n dimensions: We subtract earlier clause boxes from current box when the result is a/some box(es).
2✗
1
$or TODO
• Use indexes on $or fields to satisfy a sort specification SERVER-1205
• Use full query optimizer to select $or clause indexes in getMore SERVER-1215
• Improve index range elimination (handling some cases where remainder is not a box)
Automatic Index Selection
(Query Optimizer)
Optimal Index
• find( {x:5} )– Index {x:1}
– Index {x:1,y:1}
• find( {x:5} ).sort( {y:1 } )– Index {x:1,y:1}
• find( {} ).sort( {x:1} )– Index {x:1}
• find( {x:{$gt:1,$lt:7}} ).sort( {x:1} )– Index {x:1}
Optimal Index
• Rule of Thumb
– No scanAndOrder
– All fields with index useful constraints are indexed
– If there is a range or sort it is the last field of the index used to resolve the query
• If multiple optimal indexes exist, one chosen arbitrarily.
Optimal Index
• These same criteria are useful when you are designing your indexes.
Multiple Candidate Indexes
• find( ,x:4,y:’a’- )
– Index {x:1} or {y:1}?
• find( {x:4} ).sort( {y:1} )
– Index {x:1} or {y:1}?
– Note: {x:1,y:1} is optimal
• find( ,x:,$gt:2,$lt:7-,y:,$gt:’a’,$lt:’f’-- )
– Index {x:1,y:1} or {y:1,x:1}?
Multiple Candidate Indexes
• The only index selection criterion is nscanned
• find( ,x:4,y:’a’- )
– Index {x:1} or {y:1} ?
– If fewer documents match {y:’a’- than ,x:4- then nscanned for {y:1} will be less so we pick {y:1}
• find( ,x:,$gt:2,$lt:7-,y:,$gt:’b’,$lt:’f’-- )
– Index {x:1,y:1} or {y:1,x:1} ?
– If fewer distinct values of 2 <x< 7 than distinct values of ‘b’ <y< ‘f’ then ,x:1,y:1- chosen (rule of thumb)
Multiple Candidate Indexes
• The only index selection criterion is nscanned
• Pretty good, but doesn’t cover every case, eg
– Cost of scanAndOrdervs ordered index
– Cost of loading full document vs just index key
– Cost of scanning adjacent btree keys vs non adjacent keys/documents
Competing Indexes
• At most one query plan per index
• Run in interleaved fashion
• Plans kept in a priority queue ordered by nscanned. We always continue progress on plan with lowest nscanned.
Competing Indexes
• Run until one plan returns all results or enough results to satisfy the initial query request (based on soft limit spec / data size requirement for initial query).
• We only allow plans to compete in initial query. In getMore, we continue reading from the index cursor established by the initial query.
“Learning” a Query Plan
• When an index is chosen for a query the query’s “pattern” and nscanned are recorded
– find( ,x:3,y:’c’- )
• {Pattern: {x:’equality’, y:’equality’-, Index: ,x:1-, nscanned: 50}
– find( ,x:,$gt:5-,y:,$lt:’z’-- )
• {Pattern: {x:’gt bound’, y:’lt bound’-, Index: ,y:1-, nscanned: 500}
“Learning” a Query Plan
• When a new query matches the same pattern, the same query plan is used
– find( ,x:5,y:’z’- )
• Use index {x:1}
– find( ,x:,$gt:20-,y:,$lt:’b’-- )
• Use index {y:1}
“Un-Learning” a Query Plan
• 100 writes to the collection
• Indexes added / removed
Bad Plan Insurance
• If nscanned for a new query using a recorded plan is much worse than the recorded nscanned for an earlier query with the same pattern, we start interleaving other plans with the current plan.
• Currently “much worse” means 10x
Query Planner
• Ad hoc heuristics in some cases
• Seem to work decently in practice
Feedback
• Large and small scale optimizer features are generally prioritized based on user input.
• Please use jira to request new features and vote on existing feature requests.
Thanks!
Feature Requests
jira.mongodb.org
Support
groups.google.com/group/mongodb-user
Next up:
Sharding Details with Eliot