MongoDB for Java Developer- Notes

MongoDB for Java Devs: NotesNgô Nguyễn Chính

2016

Contents• Introductions• CRUD• Schema Design• Performance• Aggregation FW• Application Engineering

Part 1 - Introduction

What is MongoDB?• Non-Relation Data Store for JSON documents

• MongoDB is document oriented• MongoDB is schemaless

MongoDB Relative to Relation

• In top-left corner, these programs are very scalable and high-performance but not having very much functionality, like Memcached and other key value stores

• In bottom-right corner, these programs are rich functionality from RDBMS (Oracles, SQL servers)

MongoDB Relative to Relation• To retain scalability, MongoDB omit (not support) in order the following

features:• Joins• Transactions accross multiple collections

MongoDB is Schemaless• Schema? What is Schema?

• In a relational system, a table has certain columns (name, age, city...)

• Over time, may we need to keep some additional piece of information. And to do that, we have to expand the table and do an Alter Table command.

• That is how it works in the world of relational.• In MongoDB, we don't need to do that.

MongoDB is Schemaless• In MongoDB, different documents can have different schema. For example:

• This permits MongoDB to be very agile, because we don't need to have exactly the same keys in each of the documents

• Can change the schema of an existing document

Introduction to Schema DesignHow do you know whether to Embed or Not to Embed?• Depend on the way you access the data• Depend on some other practical considerations

• For example: In MongoDB, a document can't be more than 16 MB

Part 2 - CRUD

CRUD• MongoDB's CRUD operations exist as methods/functions in programming

language APIs, not as a separate language

CRUD Operations• Insert Documents• findOne()• find()• Using field Selection• Using $gt and $lt• Inequalities on Strings• Using regexes, $exists, $type• Using $or• Using $and• Querying Inside Arrays• Using $in and $all• Queries with Dot Notation

• Querying , Cursors• Counting Results• Update a Document• Using the $set Command• Using the $unset Command• Using $push, $pop, $pull, $pushAll, $pullAll, $addToSet• Upserts• Multi-update• Removing Data

Part 3 – Schema Design

MongoDB Schem Design• In the world of relational databases, ideal way to design schema, which is

to keep it in the third normal form.

• In MongoDB, it's more important to keep the data in a way that's conducive to the application using data.

• Thinking about application data patterns.• Thinking about what pieces of data are used together• Thinking what pieces of data are used mostly read-only• Thinking what pieces of data are written all the time• And then, organizing data within MongoDB to specifically suit the

application data access patterns

Application-Driven Schema!

MongoDB Schema Design• Some of the basic facts about MongoDB

• MongoDB supports Rich Documents • E.g: an array of items, a value for a certain key can be an entire other document

• Allow us to Pre Join/Embed Data for fast access• No Mongo Joins

• Doesn't support joins directly inside the kernel because joins are very hard to scale• If we want to do a join, we have to join in the application itself

• No Constraints• In the world of relational, we could have a foreign key constraint

• Atomic Operations

• No Declared Schema within MongoDB • But there's a pretty good chance that our application is going to have a schema, By have

a schema, every single document in a particular collection is probably going to have a pretty similar structure. And there might be some small changes to that structure depending on different versions of application. But mostly, each document in collection's going to have a similar structure

Relation Normalization• 3rd Normal Form

Relation Normalization• Goals of Normalization

• Free the database of these modification anomalies• For example, when we could update "email address" in one row, but

not update it in another. And therefore, leave it inconsistent within the DB

• Minimize redesign when extending database

• In MongoDB, it is very flexible that way because we can add keys and attributes to documents without changing every existing document

• Avoid bias toward any particular access pattern• In MongoDB, this is the one that we're not going to worry about

Living without Constraints• In the world of relational, using foreign key constraint is one of the ways to

make sure that keep data consistent within the database • For example, post ID in the below figure is a key foregn key contraint

• In MongoDB, there's no guarantee when using foreign key constraint

Living without Constraints• So, in MongoDB, how do you live in a world without these foreign key

constraints and keep data intact and consistent? • The answer is that embedding actually helps

Living without Transactions• In the relational world, transactions offer ACID• But lack of transactions support within MongoDB. MongoDB support

atomic operations • Atomic operations mean that when we work on a single document that

work will be completed before anyone else sees the document. They'll see all the changes or none of them

Living without Transactions• Using atomic operations, we can often accomplish the same thing we

would have accomplished using transactions in a relational DB • In a relational DB, we need to make changes across multiple tables,

usually tables that need to be joined, and so we want to do that all at once. And to do it, since there are multiple tables, we'll have to begin a transaction and do all those updates and then end the transaction

• But with MongoDB, since we're going to embed the data, since we're going to pre-join it in documents and they're these rich documents that have hierarchy you often can accomplish the same thing

Living without Transactions• What are approaches that we can take in MongoDB to overcome a lack of

transactions? There are 3 different approaches:• Restructure our code so that we're working within a single document• Implement some sort locking in software (creating a critical section,

semaphores)• Final, which often works in applications that take in a huge amount of

data, is to just tolerate a little bit of inconsistency

One to One Relations• 1:1 relations are relations where each item corresponds to exactly one

other

One to One Relations• Can model this in several different ways:

• Keep 2 documents that are related to each other one-to-one in separate collections

• Or, using embedded documents

One to One Relations• How do we choice to do it? Some of the considerations are as follows:

• Frequency of access• Size of items• Atomicity of data

• There are some good reason for keeping 2 documents that are related to each other one-to-one in separate collections

• To reduce the working set size of application• Because the combined size of the documents would be larger than

16MB

One to Many Relations• One to Many relationship is where there's 2 entites and there's many

entities that map to the one entity

One to Many Relations• Can present a one to many relationship in several ways:

• In multiple collections

• Embedded documents

One to Many Relations• Whenever the many is actually few, it is recommended to represent a one

to many relationship in multiple collections

Many to Many Relations• For examples:

Many to Many Relations• Can present many to many relations:

Multikeys• Mutikey Indexes: Why they're so useful within MongoDB?

These multikey indexes support efficient queries against array fields

Benefits of Embedding• Improved Read Performance

• The reason is the nature of the way these systems are built • One Round Trip to the DB

When to Denormalize• What is Denormalize?

• Denormalization is a process of combine 2 relation into one new relation

When to Denormalize• What is Denormalize?

• Denormalize for speed• Speed up retrievals• A strict performance is required• It is not heavy updated

• Denormalize only when there is a very clear advantage to doing

When to Denormalize• In the relational world, one of the reasons why need to normalize data because

we want to avoid modification anomalies that come with the duplication of data.

• In MongoDB, allowing these rich documents, it's easy to assume that what we're doing is we're denormalizing data. But as long as we don't duplicate data

• 1:1 relation - Embedding without duplicate any data• 1:Many relation - Embedding can also work well without duplication of data

as long as we embed from the many to the one• Many:Many relation – To avoid the modification anomalies that come with

denormalization, all we need to do is link through these arrays of object IDs in the documents

When to Denormalize

What is an ODM?

Part 4 – Performance

Performance• The performance of computer systems is driven by a variety of factors

including the performance of underlying hardware – CPU, disk, memory• Once we've chosen a hardware configuration, it's going to be our

algorithms that determince performance. For a database-backed application it's going to be the algorithms that are used to sastisfy queries.

• There are 2 way we can impact the latency and throughput of database queries:

• Add indexes to collections• Distribute the load across multiple servers using sharding

Storage Engines• A storage engine is the software that controls how the data is stored on

disk• A storage engine is the interface between the persistent storage (disks)

and the database itself, MongoDB• The storage engine itself decides to use memory to optimize the process

Storage Engines• There are 2 main storage engines:

• MMAP• WiredTiger

• The storage engine directly determines:• The data file format• Format of indexes

• What about some of the things that the storage engine doesn't handle• The storage engine doesn't affect archiecture of a cluster• The storage engine doesn't affect the API that database presents to

programmer

Storage Engines: MMAPv1• MMAPv1 is the original storage engine of MongoDB • MMAPv1 is built on top the mmap system call that maps files

into memory

Storage Engines: MMAPv1• MongoDB needs a place to put documents and it puts the documents inside

files.

• And to do that. It initially allocates, a large file. Let's say it allocates 100GB file on disk. And this disk may or may not physically contiguous on the actual disk, because there are some algorithms that occur beneath that layer that control the actual allocation of space on a disk. But from our standpoint, it's a 100GB contiguous file.

• If MongoDB call the MMAP system call, it can map this file, map 100GB file into 100 GB of virtual memory. Now we need to be on a 64bit computer because we could never get 100GB of virtual memory space on a 32 bit computer. And these are all page sized. So pages on an OS or either 4K or 16K large

Storage Engines: MMAPv1• And the OS is going to decide what can fit in memory. So if the actual

physical memory of the box is, let's say 32GB, then if we go to access one of the pages in this memory space, it may not be in memory at any given time. The OS decides which of these pages are going to be in memory

• And so when we read a document, if it hits a page that's in memory, then we get it. And if hits a page that's not memory, then OS has to bring it from disk – it has to bring from disk into memory before we can access it. That's the basics of the way MMAP storage engine works

Storage Engines: MMAPv1

Storage Engines: MMAPv1• The MMAP storage engine offers collection level concurrency, or what we

sometimes call a collection level locking. • Each collection inside MongoDB is its own file. So what that means is that if we have

2 different operations going on at the same time and they are in the same collection, one's going to have to wait for the other one, if they are write, because it's a multiple reader, single writer lock that it takes. So only one write can happen at a time to a particular collection.

• But if they're different collections, then it could happen simultaneously• Allow In-place Updates of data. So if a document is in a little page, and we do an update

to it, then we'll try to update it right in place. And if we can't update it, then what we'll do is we'll mark it as a whole, and then we'll move it somewhere else where there is more space. And then we'll update it. In order to make it more likely that we can update a document in place without having to move it, we use power of 2 sizes when we allocate the initial storage engine

Storage Engines: MMAPv1• MMAPv1 automatically allocates power of two sized documents when new documents

are inserted . What that means is that if we try to create a 3-byte document, we're going to get 4 bytes. If we try to create a 7-byte document, we're going to get 8 bytes. And this way, it's more likely that we can grow the document a little bit and that space that opens up, that we can re-use it more easily

Storage Engines: WiredTiger• This storage is not turned on by default inside MongoDB in 3.0• The first, it offers some interesting features and for a lot of workloads, it is

faster• It offers document level concurrency. Now, we don't call it document

level locking because it's actually a lock free implementation which has an optimistic concurrency model where the storage engine assumes that 2 writes are not going to be to the same document, and if they are to the same document, then one of those writes is unwound and has to try again, and it's invisible to the actual application. But we do get this document level concurrency versus collection level concurrency in the MMAP storage engine, and that's a huge win.

Storage Engines: WiredTiger• The second, this storage engine offers compression, both of the

documents themselves, of the data and of the indexes

Storage Engines: WiredTiger• We talked before about having a 100GB file, and we had a 100GB file on

disk using Wired Tiger (WT).• WT itself manages the memory that is used to access that file. So the file

is brought in pages and the pages can be of varying sizes, and WT decides which blocks it's going to keep in memory and which blocks to send back to disk. So it's because WT is managing its own storage that WT can, for instance, compress. We don't want to keep it compressed in memory because when we have read access to a document, if they hit memory and they hit our cache of data that's in memory, we don't want to have to decompress it. And with WT, you don't have to because it's kept in the clear in memory but before they write out the disk, they can compress it and that saves a tremendous amount of space for certain types of data.

Storage Engines: WiredTiger• WT is also an append only storage engine. There are no in place updates.

That means that if we update a document in WT, what they're going to do is they're going to mark that the document is no longer used and they're going to allocate new space somewhere else on disk and they're going to write it there. And eventually, they reclaim the space that is no longer used. This can result in writing a lot of data sometimes. If we have very large documents and we only update one item, WT is going to have to write that entire document again, but it's this append only nature of the way they store the data allows it to run without locks at the document level and gives them the document level concurrency. So overall, it's often faster

Storage Engines: WiredTiger

Indexes• There is a collection, and these documents might be on disk in

essentially arbitrary order. There might be no particular order to the documents on disk. Now, if there's no index and we wanted to find all the documents where, we would need to scan every document in the collection and there could be millions of those, and this collection scan or a table scan as it's called in the relational world is just death to performance. It's probably the single greatest factor on whether or not our queries are going to perform well. More important than the speed of the CPU, more important than how much memory we have is whether or not we can use some sort of index to avoid having to look at the entire collection

Indexes• So how does an index work? What is an index?• An index is an ordered set of things. And each of these index

points has a pointer to a physical record. So it's going to have some sort of pointer to a location on a disc. The nice thing about having something that's ordered is it's very fast to search it. Because if it was actually a linear list, and it's not in a typical database, then we could search using a binary search.

• In real databases and MongoDB, the actual way tha this index is structured is called a B-tree

• Because we're going to want to put these indexes on the items tha we believe we're going to be querying on, because that's going to make querying much faster

Indexes

Indexes• But sometimes we don't just want to query on, let's say, name.

We also want to query on name and maybe hair color. So how would that work?

• An index on name and hair color would be represented as follows. We'd write (name, hair color), and that's ordered. And if we did that, then all of the index entries would be ordered by (name, hair color). So we can see that if we wanted to find, let's say, all the certain name with a certain color hair, we could do it pretty easily because we could, again, do a binary search of the structure and then do it again through this part of the structure to find. We could also do range queries

Indexes

Indexes• But if, on the other hand, we specified just the hair color we'd kind of be

stuck, right? Because if we said just fine me all the people with hair color, they're sort of all over the place and they're not ordered in a particular way in this larger structure, so we can't use a binary search to find them. So as a result, whenever we're using an index, we need to use a leftmost set of things, so if the index were, let's say, extended and included the date of birth, then we could search on just the name – that would work in this index because we could just do a nice little search. Or we could search on (the name, the hair color) or we could search on (the name, the hair color, the date of birth). But we can't come in with just the date of birth or just hair color because then we have no way of searching this index

Indexes

Indexes

Indexes• The other point we want to make is that indexing is not free, because

whenever we change something in a document which affects the index, we're going to have to update that index, we're going to have to write it on memory, and eventually on disk

• Now, the indexes aren't represented the way we did linearly. They're represented as B-trees, but maintaining these B-trees takes time. So as a result, if we have a collection and we have indexes on that collection and the writes affect items that were indexed, the writes will be slower than if there was no index. That's right, indexing actually slows down our writes, but our reads will be much faster. Now, if we were just writing and we never wanted to ever, ever find a document, we might not want to have an index

Indexes• And in fact, one of the strategies:

• When inserting a very large amount of data inside a database is to have no indexes on the actual collection at all, to insert all the data.

• And then after all the data is inserted, then add the indexes and then build the indexes.

• That way we don't have to incur maintaining the indexes while we add data incrementally. And the fact that writes are slower and the fact that it takes time to update an index on every single write that affects anything in index is one of the reasons why we don't just want to have an index on every single key in a collection. Because if we had an index on every single key in the collection then we're going to slow down our writes more, and we're going to use a lot more disk space too, to maintain those indexes

Indexes• If we had 10 million item in a collection and there's no index and we search

on something, anything, we're going to look at 10 million documents and that's pretty expensive. And if we have to look at 10 million documents or 100 million documents and the amount of memory we have is much smaller than amount of disk or space that the document represent on disk then we're going to wind up swapping all those documents to memory and creating a tremendous amount of disk IO, which is going to be pretty slow.

• And this is why indexing is so absolutely critical to performance

Creating Indexes

Discovering Indexes (and Deleting) Indexes

Multikey Indexes

Dot Notation and Multikey

Index Creation Option, Unique

Index Creation, Sparse

Index Creation, Background

Using Explain• Explain is a command inside DB, it will tell us exactly what indexes were

used for our queries and how they were used• Let's say, we have a foo collection

• And we put an index on this collection, and the index is a compund index on a, b and c

Using Explain• We do a query and add Explain to the end, something to note here:

• Basic Cursor means that there was no index used to execute this query• No index used to execute this query. Because we gave just c, and the index is a, b, c

• Millis field which is number of milliseconds that was required to execute the query• With millis field = 3ms: pretty slow for a database

Using Explain• We do another query:

• There is BtreeCursor a_1_b_1_c_1 used to perform this query, that is the name of compound index, on a, b, c. So, the database using the index

• Second thing, that is isMultiKey is false. This is whether or not the index is a multikey index. None of the values inside a, b and c are arrays, so it is not a multikey index

• n1 is the number of documents that is returned


• nscannedObjects is the number of documents that were scanned to answer the query which was the one.

• nscanned is the number of index documents


• IndexBounds which show bounds were used to look up the index. We looked up 500, 500, the lower and upper bound of 500 to perform that index, and then the rest of them are set to min and max element


• IndexOnly field tells us whether or not the database query could be satisfied with just the index. If everything we're asking for can be satisfied with just the index, then indexOnly = true, and the document itself didn't have to be retrieved

Explain: Verbosity• There are 3 modes for the Explain command line:

• queryPlanner: Default mode for Explain, it is very useful• executionStats: Increasing the levels of verbosity, it includes the Query

Planner• AllPlansExecution: It includes Query Planner mode and Execution

Stats mode

Covered Queries• A covered query is not a query that is covered by a house, but instead it's

a query where the query itself can be satisfied entirely with an index. And hence, zero documents need to be inspected to satisfy the query.

• If we can satisfy a query entirely from the index, that's going to make the query a lot faster

Covered Queries

When is an Index Used?• How MongoDB chooses an index to satisfy a query?

• Let's imagine we have 5 indexes. When a query comes in, MongoDB looks at the query's shape. Shape has to do with what fields are being searched on and additional information such as is there a sort?

• Based on that information, the system indentifies a set of candidate indexes that it may be able to usein satisfying the query.

• So let's assume we have a query come in, and 3 of our 5 indexes are identified as candidates for this query


• MongoDB will then create 3 query plans, one each for these indexes, and in 3 parallel threads, issue the query such that each one will use a different index, and see which one is able to returns the fastest. So visually, we can think of this as a race, something like this.

• The idea here is that the first query plan to reach a goal state is the winner. But more importantly, going forward it'll be selected as the index to use for queries have that same query shape


• So what's the goal state here? Well, it can be one of a couple of things. So it could be that one of the query plans returned all the results for the query. Another way a query plan can win is by returning a certain threshold number of results. But there's a caveat here, and that is that it's able to return the results in sort order. Now, the real value of doing this is that for subsequent queries that have the same query shape, MongoDB knows which index to select. The way we achieve that is through the use of a cache. So the winning query plan is stored in the cache for future use for queries of that shape


•


• Now of course, over time, our collection changes, the indexes change, so we don't want this to necessarily be the index we use forever. So there's several ways in which a query plan will end up being evicted from the cache.

• One of those if if there are a threshold number of writes. Right now that threshold is 1.000 writes.

• Another way, of course, is if we rebuild the index• Or if any index is either added or dropped from the collection• And finally, if the mongod process is restartedWe would also lose the query plan, and other plans, in the cach


When is an Index Used?• So this basic processes is what MongoDB uses in order to

figure out which index to use for the queries we submit

How Large is Index?• As with other databases, with MongoDB, it's very important that we're able to

fit what's called the working set into memory. So the working set is the portion of our data that clients are frequently accessing. As you might imagine, the key component of this is our indexes.

• For performance reasons, it's essential that we can fit the entire working set into memory because going to disk for data is a time consuming operation and performance will degrade significantly. If, for frequently accessed data, we have to go to disk regularly.

• Now, this is especially true with indexes, because if, in order to search an index, we first have to pull it from disk into memory, we lose a lot of performance benefits of having the index in the first place, so it's especially important that our indexes fit into memory

• So let's look at how we can measure the size of our indexes as a means of, say, estimating the amount of memory we'll need for a MongoDB deployment.

How Large is Index?

How Large is Index?• To see the size of our indexes, we can use the stats method. So we call

the stats method on the collection of interest,

Number of Index Entries?• Index cardinality, which is how many index points are there for each

different type of index that MongoDB supports.• In a regular index, for every single key that we put in the index, there's

certainly going to be an index point. And in addition, if there is no key, then there's going to be an index point under the null entry. So essentially, you get about one to one relative to the number of documents in the collection in terms of index cardinality. And that makes the index a certain size. And it's proportional to the collection size in terms of its end pointers to documents.

• In a sparse index, when a document is missing the key being indexed, it's not in the index because it's a null and we don't keep nulls in the index for a sparse index. So here, we're going to have index points that could be potentially less than or equal to the number of documents.

Number of Index Entries?• Index cardinality, which is how many index points are there for each different

type of index that MongoDB supports.• And finally, here in a multikey index, which is an index on an array value –

and an index becomes a multikey index as soon as you have at least one value inside any document that is an array. Then there may be multiple index points for each document. For instance, if there's some sort of tags array in a document, and it's got three or five or fours tags, then there's going to be an index point for every single one of these keys. And so it could be greater than the number of documents. And it could be significantly greater than the number of documents. And this comes up because indexes need to be maintained. There's a cost of maintaining them. And if anything causes the index to have to get rewritten – For example, let's say a document moves. When a document moves – and it might move because you just added something to it that makes it too large to fit in the space that the database has for it on disk, so it needs to move it to a new location

Number of Index Entries?• Index cardinality, which is how many index points are there for each

different type of index that MongoDB supports.• Evey single index point that points to that document needs to be

updated. Now, if the key is null for a particular index, well then there is no update that needs to happen to the index. If it's a regular index, well then one index point needs to get updated for sure. And if it's a multikey index and there's 100 or 200 or 300 items in an array, then they all need to get updated inside the index.

Number of Index Entries?

Number of Index Entries?

Geospatial Indexes

Geospatial Spherical

Text Indexes• Text Indexes can be very useful when dealing with text. It's called a full text

search index• So why would you use it and what would you it for?• Let's say you had a very large piece of text that was in a document,

something like the US Constitution, which starts out, "We, the people of the United States, in order to form a more perfect union". Let's say you had that document right here in a key called My Text and you had the entire preamble to the US Consitution in this key, My Text, and you wanted to search it

• Well, if you searched just on any given word, then you wouldn't get anything back because MongoDB, when you search on strings, the entire string needs to be there

Text Indexes• So as an alternative, you could put every single word into an array and

then use the set notation operations to push things into it and then search for whether or not the words are included, but that's pretty tedious and there are certain other features that would be missing

• So instead, what we have is something called a full text search index, which is abbreviated text, which will index this entire document and evey word much in the way an array is indexed to allow you to do queries into the text, basically applying the OR operator and looking for one of several words.

• So let's go look at now a specific case and see how it would work.

Text Indexes

Text Indexes• We create a collection call sentences, and this collection has got a bunch

of mostly just randomly inserted words into a words key. There is no text search index on this right now. This is a regular collection.

• If we wanted to search for, let's say, "dog shrub ruby", we could do it.

Text Indexes• But, if we just do "dog ruby", doesn't find it. That's not going to work too

well for us to search on these different words.• So now let's add a text index and now we want to put an index on words of

type text.

• And now when we search it, "dog shrub ruby", it's going to work a lot better, so let's do that.

Text Indexes• Let's first look at the syntax for searching a full text index

Text Indexes

Efficiency of Index Use• Designing/Using Indexes

• But as with so many things, this requires some upfront thinking and some experimentation. To be sure you get the right indexes in place, what you'd really want to do is test your indexes under some real world workloads and make adjustments from there.

• So what we're going for here is the selectivity of our index, and to what degree for a given query pattern the index is going to minimize the number of records scanned. And we have to consider this in a scope of all operations to satisfy a query, and sometimes make some trade offs.

Efficiency of Index Use• Designing/Using Indexes

• So we'll have to consider, for example, how sorts are handled

Efficiency of Index Use

Logging Slow Queries• Should be checking our logs to make sure we don't have a lot of slow

queries and this is something that's built in and we don't need to do anything to get this

Profiling• The profiler is a more sophisticated facility. It will write entries, documents,

to system.profile for any query that takes longer than some specified period of time

• There are 3 levels of the profiler:• Level zero: the default level, and it means it is off.• Level one: to log slow queries• Level two: to log all queries (this is really more of a general debugging

feature than a performance debugging feature)• So why would we want to log all queries? And the reason is because not

so much for performance debugging, but because when we're writting a program, it's convenient to see all the database traffic so that we can figure out whether the program is doing what we expect

Profiling

Profiling

Sharding Overview• Sharding is a technique for splitting up a large collection amongs multiple

servers.• When we can't get the performance we want from a single server. And so

what we can do is we can shard.• And when we shard, we deploy multiple mongod servers, and in the front,

we have a mongos which is a router. And our application talks to mongos, which then talks to the various servers, the mongods

Sharding Overview•

Sharding Overview• This is sometimes and often and recommend to be a set of servers.

What's called a replica set•

Sharding Overview• A replica set keeps the data in sync across several different instances so

that if one of them goes down, we won't lose our data• But logically, we can look at this replica set as one shard. And when we're

doing this – and for the most part, it's transparent to the application

Sharding Overview• However, the way Mongo shards is that we choose a shard key. So for

instance, in that student collection, we might decide that student_id is our shard key, or it could be a compound key, and the mongos server, it's a range-based system. So based on the student-id that we query, it'll send the request to the right Mongo instant

Sharding Overview• What do you really have to need to know as a developer?

• First thing, we need to know is that an insert must include the shard key, the entire shard key. So if it's a multi-parted shard key, we must include the entire shard key in order for the insert to complete. So we have to be aware of what the shard key is on the collection itself.

• Second thing, we need to know is that for an update or remove or a find, if mongos isn't given a shard key, then what it's going to have to do is broadcast the request to all the different shards that cover the collection. So you have some collection, like the students collection, and it's broken up into big parts that map each to different shards: shard0, shard1... And then there's some chunking within here to allow Mongo to keep it balanced, but that doesn't really matter from our standpoint. The point is that if it doesn't know the shard key, on the query, it has to broadcast it. Now, it may be the case that we're doing a query that we don't know the shard key, in which case it does have to be broadcast.

Sharding Overview

Sharding Overview• What do you really have to need to know as a developer?

• But if we know the shard key, we should specify it because we will get better performance because we'll only be utilizing one of the servers. And we won't be keep the other servers busy with this query as well

• A couple of other subtleties – with updates, if we don't specify the entire shard key, we have to make it a multi-update so that it knows that it needs to broadcast it

• Within each of these servers, all these instances, the same techniques apply as we have been talking about this entire unit

Part 5 – Aggregation FW

Simple Aggregation Example

The Aggregation Pipeline

Simple Example Expanded

Compound Grouping

Aggregation Expressions

Using $sum

Using $avg

Using $addToSet

Using $push

Using $max and $min

Double $group stages

Using $project

Using $project

Using $match

Using $sort

Using $limit and $skip

Revisiting $first and $last

Using $unwind

Using $unwind

Mapping between SQL and Aggregation

Limitations of the Aggregation FW• Some limitations:

• First one, default, we have a 100MB limit. That's pretty small for pipeline stages

• The way we get around that is using allow disk use option, which will get around that 100MB limit

• But unless we specify that option to aggregation, we will be limited to that 100MB, and we will find that our queries will not come back if they're very large, and they have large intermediary results in the grouping or the sorting stages

• Second one, if we decide to return the results in one document, then there can be a 16MB limit

• The last limitiation, about sharding. So in a sharding system, as soon as we use a group by or a sort or anything that requires looking at all the results then what will happen is that the results will be brought back to the first shard

Limitations of the Aggregation FW

Limitations of the Aggregation FW• A sharded system has multiple shards. And in fact, each of these shards

may be a replica set. So in that situation, we've got a Mongos router in front.

• And that application is going to send an aggregation query, if it uses a collection that's sharded, is going to send that to every single one. And some parts of that, for instance a projection or a match, could go in parallel with all the shards.

Limitations of the Aggregation FW• But if we do a group by phase for instance or a sort or anything else that

requires looking at all the data, then what's going to happen is MongoDB is going to send all that data to the primary shard for database.

• And the primary shard is where an unsharded collection would live. So initially we'll be distributing the work, and then if we start doing a group by or a sort then the results are going to get sent to one shard to get processed further so that everything can be collected in one place. And so in that sense, we may not find the same level of scalability in aggreation that we might find let's say Hadoop, where there might be greater parallelism for every large what's called MapReduce jobs

Limitations of the Aggregation FW

Part 6 – Application Engineering

Write Concern• How we make sure that the writes that we make to the database actually

persist?• Let's say, we have a server and server has serveral parts:

• CPU is running Mongod program• Memory• Persistent disk

• There'll be a cache of pages inside memory that are periodically written and read from disk depending on memory pressure

• The secondary structure, that's called a journal, and a journal is a log of every single thing that the database processes. And when we do a write to the database, an update, it also writes to this journal. But the journal's in memory, as well

Write Concern

Write Concern• When does the journal get written back to disk?

• Because that is when data is really considered to be persistent. So when we do an update, let's say, or an insert, we're going to contact via the network (a TCP connection). And the server is going to process the update. And it's going to write it into the memory pages. But they may not write to disk for quite a while, depending on the memory pressure. It's also going to simulataneously write this update into journal. And then, it'll have the exact information that was updated

Write Concern


• By default in the driver, when we do an update, we wait for a response. It's an ack update or an ack insert. But we don't wait for journal to be written to disk. The journal might not be written to disk for a while.

• The value that represents – whether we're going to wait for this write to be ack by the server – is called w

• Default, w=1, which means wait for this server to respond to our write.

• But, by default, j=false, and j, which stands for journal, represents whether or not we wait for this journal to be written to disk before we continue

Write Concern


• So what are the implications of these defaults?• The implications are that when we do an update or an insert into

MongoDB, we're really doing an operation in memory, and not necessarily to disk, which means, of course, it's very fast. And then, periodically, the journal, gets written out to disk. It might be every few seconds that it gets written out to disk. So it won't be long. But during this window of vulnerability when the data has been written into the server's memory into pages, but journal has not yet been persisted to disk, if the server crashed, we could lose data

• And we have to realize that, because the write came back as good and it was written successfully to memory, it may not ever persist to disk if that server subsequently crashes

• And whether or not this is a problem depends on application

Write Concern• We have w value and j value:

• Default, w=1, j=false: • Whichs means that we're going to wait for that write to be

acknowledged by database, but not going to wait for journal to sync. So this is fast

• But there's a small window of vulnerability. On the other hand, if we want to eliminate that window of vulnerability, we can just set j=true

• w=1, j=true: we can do this inside the driver, at collection level, at database level, or at client level

• It's going to be much slower• The window of vulnerability is removed

Write Concern• We have w value and j value:

• w=0: Don't recommend it all• It will be unack write, which means that the write will be sent to

database but we won't even wait until server responds• aka, a unacknowledged write

Write Concern

Write Concern

Network Errors• We may not receive the affirmative response if there are network errors.

• So you can send request from application through application to MongoD. It can complete it successfully, and then there could be TCP reset and network actually can get reset in a way that we never see response

Network Errors• So we would get an error, and on error, we might assume, we got an error, it

didn't happen. But it may have happened.• Now, we'll go over later different types of areas we can get in application

(PyMongo) and talk about which ones indicate this problem potentially, and what we can do about it.

• Generally speaking, for an insert, it's possible to guard against this. And the reason it's possible to guard against this is that if we let driver create _id and we do an insert into DB, then we could do that insert multiple times, and it won't be any harm. Because if we do it the first time and we get an error, then we can always just do it again. And if we go it the first time and we get an error and we're not sure whether or not answer completed because it's a network error, we could still try to perform the insert again. And provided us try to perform it with the exact same _id, the worst case scenario is we'll get a duplicate key error when we try to insert it

Network Errors• However, an update is really where problem occurs, especially an update

that is not particularly item potent, • That for instance included a $inc command. So we're telling database to increment a

certain field. In that case, if we get back a network error, we don't know whether or not the update occurred. Now, maybe we know enough about the values that we can check with them that the update occurred, which is fine. But if we don't know the starting value in DB for that field, then it's not possible for us to know whether or not it occurred or not in the case where we get a network error. So we just want to make that point that there is this level of uncertainty

• Now, in practice, when the network is running well, it's very rare to have a situation where we get an error back because of some sort of connection problem or transient problem and in fact, the operation did succeed at the DB. It's extremely rare.

• And if we really want to avoid it at all costs, what we basically need to do is turn all our updates into inserts by reading the full value of document out of database and then potentialy deleting it and inserting it again, or just inserting a new one, in which case we won't have this problem

Network Errors

Replication• How do we get availability and how do we get fault tolerance?

• And by that, if that node goes down, we want to still be able to use system

• And if the primary node goes down and we lose it entirely for some reason, let's say there's a fire, how do we make sure we don't lose our data between the backups?

• And so what we do to solve both those problems is we introduce replication

Replication• We have the concept of a replica set. And replica set is a set of Mongo

nodes. And these are all Mongos, MongoDs that act together all mirror each other in terms of data. There is one primary, and then the other nodes are secondaries. But that's dynamic. And data that's written to the primary will asynchronously replicate to secondaries

• Application and its drivers stay connected to primary, and will write to primary

• If the primary goes down, then the remaining nodes, in this case these 2 nodes will perform an election to elect a new primary.

• To elect a new primary, we have to have a strict majority of the original number of nodes

• So since the original number of nodes here was 3, we'd need 2 nodes to elect a new primary, and that's the number we have

Replication• So if that went down, then this can become the primary. And at that point,

what would happens is our app would reconnect to the primary for rights, through the drivers. All transperently

• So this group of nodes is called a replica set

Replication

Replication• And that's going to form the basic mechanism by which we get fault

tolerance and avaiability in Mongo

Replication

Replication• The minimum number of nodes is 3. And the reason is that if we had fewer

than 3 nodes then what would remain would not be a majority of that set of the original set. And so there would be no way to elect a new primary. So we'd go without a primary, which means we could no longer take rights

Replica Set Elections• Let's talk about the election that goes on when the primary fails, and the

different types of nodes that can exist in a replica set.

Replica Set Elections• There are several types of replica set nodes:

• Regular: A node has data and become primary. And it's the most normal type of node. It can be a primary or a secondary

• Arbiter: An arbiter node is just there for voting purposes and there's lots of reasons why we might want to use one of these

• For example, if we want to have an even number of replica set nodes, then we need to make sure that there's an arbiter node out there, so that when one goes down, we still have a majority, a strict majority to vote a new leader. So that's one reason why we might want to use an arbiter

• Or if we just want to put together a replica set, let's say with 3 nodes. But only 2 of them are real nodes, and one of them is an arbiter because we only have 2 machines, we can do that as well

• And an arbiter will participate in the voting, as will a regular node

Replica Set Elections• There are several types of replica set nodes:

• Delayed/Regular: That's often a disaster recovery node. • It can be set to be, let's say, an hour, 2 hours, .. whatever we want

behind the other nodes.• And it can participate in the voting, so that's yes of the voting, but it

cannot become a primary node• And to achieve this, its priority is set to zero• And we can set the priority of any node to zero if we want, and it

cannot be elected to be a primary node• Hidden: Node is often for analytics. It cannot become the primary node,

so it's not going to be the primary. Never the primary. But it also has the characteristic that its priority is set to zero. And so therefore it cannot become the primary. So the other concept is one of votes, that we can decide how many votes each of these nodes has

Replica Set Elections• We'll assume that every node has one vote, because in reality, it's not too

convenient ot typical to give more than one vote to a node• When failover occurs because for some reason the primary is not

reachable then the remaining nodes will elect a new primary, and then that node will become primary. And then the drivers will reconnect

• And it's pretty transperent to the application

Replica Set Elections

Write Consistency• In the replication within MongoDB, there's only a single primary at any

given time. And in the default configuration, our writes and reads go to that primary

• Now, our writes have to go to the primary, but our reads don't have to go to the primary. The reads could go to the secondaries if we'd like

• But if we allow the writes to go on to the primary, then what happens is we get strong consistency of reads with respect to writes. And what this means, among other things is that we won't read stale data. That if we write something, that we'll be able to read it and other application servers that read it will also be able to read what we wrote after we wrote it. Provided that we waited for the write to complete through journaling

Write Consistency

Write Consistency• Now, we can, if we prefer allow our reads to go to our secondaries. But if

we do that, then we may read stale data from the secondaries, relative to what we wrote, or somebody else wrote in the primaries. And the lag between any 2 nodes is not guaranteed, because the replication is asynchronous. And we'll go through the different read preferences that we can set in terms of the drivers to decide whether or not we're willing to accept reads from the secondaries. And the reasons why we might want to read from the secondaries, the tranditional reason people do it is because they want to scale their reads through the replica set. So they feel that if they can seen their reads to all the nodes, that they'll get some read scaling

• So, that's really true or not, and whether it's a good idea?

Write Consistency• The memory model here is strongly consistent. On the other hand, though, when

failover occurs, briefly during that time when there's no primary, we can't complete a write, because there is no primary. And then when another primary is elected, let's say this primary goes down, which is going pretty rare, obviously. Then during that period, we won't be able to actually perform any rights. And this is in contrast to some other systems that compete with MongoDB that have a weaker form of consistency.

• Some of them have what's called eventual consistency. And eventually consistency means that eventually, we'll be able to read what we wrote, but there's no guarantee that we'll be able to read it in any particular time frame. The problem with eventual consistency is it's pretty hard to reason about, because when we write something into DB, most web servers these days, most application servers are stateless. So it's a little disconcerting to, let's say, write the session information and write other information into the database, and then read it back out and get a different value. And then have to reconcile what that means

Write Consistency• So MongoDB does not offer eventual consistency in its default

configuration where we read and write to the primaries• If we want to form an eventual consistency, we can read from the

secondaries, which will us eventual consistency

Write Consistency

Write Consistency

Creating a Replica Set

Replication Internals• Let's say, we have a 3-nodes replica set. And each of these mongods

which has within an oplog. And the oplog is going to be kept in sync by mongo. So what happens is one of these in a primary. And of course, our writes have to be to our primary. And our secondaries are going to be constantly reading the oplog of the primary

• When we do a write on a primary, it's going to get written to the oplog. And then the secondary is going to be creating what's new in the oplog and applying those same operations to the secondary

• And when an election occurs, for instance, if we decide that we want to take down the primary, then a new primary will be elected, and the other will become primary. And then when this comes back up, it might become secondary.

Replication Internals

Replication Internals

Failover and Rollback• While it is true that a replica set will never rollback a write if it was

performed with w=majority and that write successfully replicated to a majority of nodes, it is possible that a write performed with w=majority gets rolled back. Here is the scenario: you do write with w=majority and a failover over occurs after the write has committed to the primary but before replication completes. You will likely see an exception at the client. An election occurs and a new primary is elected. When the original primary comes back up, it will rollback the committed write. However, from your application's standpoint, that write never completed, so that's ok

Failover and Rollback

Read Preference• Default, MongoDB reads and writes both go to the primary. Now, let's say,

we have 3-node replica set• It may be the case that if you did a read to a secondary and had written it

to the primary. If the write had not yet propagated to the secondary, you may not read what you wrote, which makes it harder to reason about your programs

• Nevertheless, if you would like to read from secondaries in MongoDB, we do allow that. You always have to write to the primary. But you can read from the secondaries. And this is called the read preference.

Read Preference

Read Preference• There are several different options for that:

• Primary, default, which means we want to read from primary.• Primary preferred, and that means we want to read from the primary

but if the primary is not down, we'll take the secondary• Secondary, which means we want to rotate our reads to our

secondaries and only our secondaries. • Secondary preferred. It prefers the secondaries, but it could also send

it to the primary, if there's no secondaries available• Nearest, it will tell the driver to send it to MongoDB that seems to be

the closest in terms of ping time. And as a default, anything within 15ms of that time that is also considered closest, it will also send it to that node, and then there's also, within nearest, there's a concept of using a tag set, which is a data center awareness idea that you can mark certain nodes as being part of a certain data center

Read Preference• If you decide to read from the secondary, you're not going to have a

strongly consistent read. You're going to have what's called an eventually consistent read, which is that eventually data will show up on a secondary. But it won't be necessarily data that you wrote

Read Preference

Review Implications of Replication• The whole idea of using replica sets is that they're very transperent to the

developer, and hence you don't really have to understand thay they're there. It's supposed to just create greater availability and fault tolerance, and not getting your way

• But there's a few things you need to remember:• Seed lists: When you're using the drivers which are primarily responsible for you

to a new node during failover, after a new primary is elected, drivers need to know at least one member of replica set. So you need to understand that a seed list exists

• Write concern: The second is that now that we're in this distributed environment, you need to understand this idea of write concern, and in particular, the idea of waiting for some number of nodes to ack your writes through the w, j parameter, which lets you wait or not wait for primary node to commit that write to disk. And also the w timeout paramter, which is how long you're going to wait to see that your write replicated to other members of replica set

Review Implications of Replication• But there's a few things you need to remember:

• Read Preferences: There's multiple nodes for you to potentially read from, you have to decide whether or not you want to read from your primary, which is the default, most obvious and preferred thing to do, or whether you want to take your reads from your secondaries. And if you're going to take your read from your secondary, the application has to be ready to use data that's potentially stale with respect to what was written

• Errors can happen: And the final idea is that even though replication exists, and that you have these nodes in place to deal with errors and defaults, and elect a new primary if needed still, errors can always happen. And these errors can happen because of transient situations like failover occurring, or they can happen because there are network errors that occur. Or maybe there's actually errors in terms of violating the unique key constraints or other syntactic things. And so generally speaking, to create a robust application , you need to be checking for exceptions when you read and write DB to make sure that if anything comes up, that you know it. So that you can make sure you understand the application of what data has been committed, and what data is durable in your application

Review Implications of Replication

Introduction to Sharding• Sharding, which is approach to horizontal scalability. It is the way we

handle scaling out• And basically, rather than just have your collection be on one database,

you want to put it on, let's say, some number of databases. It could be quite a large number of databases

• And the goal is that it be transparent, that when you do axes to some collection. Let's say you had some large orders collection and there were billions and billions of these items, that somehow you wouldn't need to have to figure out where it is in the system. And it would just work transparently. So the high-level approach to this is that you set up these shards

• And these shards are meant to split data up from a particular collection

Introduction to Sharding• Now, the shards are typically, in and of themselves, replica sets. So we

talked about what a replica set is. So there might be 3 hosts within a shard.

• So you're going to make queries. And somehow, these queries are going to get distributed. Now the way this works is that there is a router that came with your installation and it is called MongoS, all right? And you probably saw that binary when you unpacked installation for MongoDB. And that router is what's going to take care of this distribution for you.

• So it's going to keep some sort of connection pool or knowledge of all the different hosts, and it's going to route them properly

Introduction to Sharding

Introduction to Sharding• So the way we do sharding is we use it a range-based approach, and

there's a concept of a shard_key.• So let's say you have the orders collectiion and maybe you could imagine

a trivial sharding might be on order_id. But if you were sharding on an order_id, let's say that you queried, and you ask for a certain order number. Then the MongoS would have some sort of notion of ranges of order numbers that are assigned to each shard. And this is done by way of mapping to chunks. So the idea is that your orders table would be broken up into chunks. And they could be potentially migrated by balancer to make sure that it's going to stay balanced

• So you have the orders collection and it's broken up into these chunks of orders based on ranges of order_ids. And each of these chunks lives on a particular shard


Introduction to Sharding• And then when you do a query and the application sends that query to a

MongoS, and these MongoSs are then talking to replica sets, of course, that are running MongoDs. These are all MongoDs.

• If the query can be satisfied by a particular shard – let's say, we want order_id 10. It says, OK, let's look at my mapping . Order_id 10 maps to this chunk, this chunk maps to this shard, and it will route it directly to the shard. And then you'll get back a response pretty quickly

• And the other hand, if the query doesn't include the shard_key, and in this case, the shard_key was order_id, then what it'll do, it'll have scatter this request to all the servers, send them to all servers, and then gather back the answer and then response back to application . So that's the way it would work if it could not utilize the shard_key

Introduction to Sharding• In addition, when you're dealing with a sharded environment and we're

going to go through this a little bit more you have to include the shard_key on any insert. Because it needs to know where to put this thing. So once you declare that order_id, let's say as a shard_key it's now illegal to have a document inside the orders collection that doesn't have an order_id, because it wouldn't know what to shard to put it in

• Now, sharding is at a DB level. You could say, I want to shard or not shard a DB. And even beyond that, whether you want to shard or not shard a collection. And collections that aren't sharded are going to wind up right here in shard 0, in left-most shard

• So just to review at a high level, to get horizontal scalability, what we do is we shard, which means we break up a collection onto multiple logical hosts and that we do that according to a shard_key

Introduction to Sharding• The shard_key is something you're going to determine. The shard_key is

some part of the document itself. If it was an orders table, it could be the order_id. You could also shard_id, which some people do for certain collection. We'll talk about why that may or may not be a good thing.

• Now, for a blog post table, it could be the post id, anything you want, pretty much. And once you make that decision, then Mongo will then break up that collection into chunks and decide what shard each of the chunks lives on in range-based way, and then any query that you make, which now has to be routed to a MongoS, will then go to the approciate shards to answer your query.

• And by the way, in case you were wondering, yes, there can be more than one MongoS. They're really stateless and they typically run on the same server as the application. And they're handled very much the same way similar to the way a replica set would be handled, which is that if one of them goes down, then it will connect to a different one. And the MongoSs now talk to a MongoD

Introduction to Sharding• Once you're in a sharded environment like this, and the shards, again, are

probably almost always replica sets, you no longer connect directly to MongoD with your application



Building a Sharded Environment• Aside from these servers, you're also going to need some config servers.

And the config servers, typically you have 3 of them, althogh you can have as few as one. And each of these config servers holds the information about way your data is distributed across these shards. So let's say you have a large collection of people, that are members of your website and there's a UserID. You could decide that you want a certain user to live on a certain shard. You could take the user's collection and you can shard it across these different servers. Now the way it works is that there's this concept of a chunk in sharding. And the chunk of data, which holds a bunch of documents, is all mapped to a particular shard. And it’s these config servers that know the way the chunks are assigned to the shards. And it's these config servers that know the way the chunks are assigned to the shards, Now the config servers themselves, they’re not a replica set, but they do a 2-phase commit to make any changes

Building a Sharded Environment

Building a Sharded Environment

• There's 2 replica sets, each one with 3 nodes, and there's 3 config servers

Implications of Sharding• Every document needs to include the shard key• Shard key is immutable• Index that starts with shard key. So what this means is that if the shard key

is Student ID then index could be (student_id, class)• Shard key has to be specified or multi. If you do multi, or don't specify the

shard key in the update when you do multi, then it's going to send out to all of the nodes

• No shard key means scatter gather operation, which could be expensive. So you have to think about it when you're creating the shard key and you're sharding the collection, what is a key that I'm probably going to use in most queries? Because that's the key that I'd really want to have in my shard key

Implications of Sharding• No unique key, unless it's also part of the shard key.

Sharding + Replication• Sharding and replication are almost always done together.

Sharding + Replication

Choosing a Shard Key• The first, you need to make sure that there's sufficient cardinality. By that I

mean there is sufficient variety of values. So for instance, if you decide to shard on something where there's only 3 possible values, then there would be no way for Mongo to spread it across let's say 100 shards. So you have think about whether or not there's sufficient cardinality for it to be a proper shard key. And you can often solve that problem by putting in a secondary part of key, which has more cardinality than the first part of key

Choosing a Shard Key• The second, you want to avoid hot spotting in writes, just the way the

implementation works today, will occur for anything that's monotonically increasing. And I can explain why that is. So if you did sh.status and you looked at those configuration chunks, you may have noticed that the first one was set to min key to some value, and then there were a bunch of values for all the other chunks. And then the final one was some value to max key, which is the maximum possible key. And the problem is that when you do a insert of something that has a larger value than has ever been seen before. For instance, if you decide to shard on the key bson_id, bson-ids increase pretty much monotonically. And if you look at them you'll notice they'll just keep increasing, increasing, increasing. That's because the high part of them is actually a time stamp. And so what's going to happen is that every single one, as it gets inserted, is going to be larger than the maximum thing that's ever been seen by the collection before. So it's always going to be assigned to the highest chunk.

Choosing a Shard Key• And what this means is that if you think about it, if you've got these 10

nodes, or whatever number of nodes you have in your shard, and you start doing inserts, and you're inserting, you're inserting, you're inserting. The inserts are just going to continue to just hammer this one shard. And then eventually maybe it'll re-balance. But then it'll go to some other shard. It doesn't matter, it's always going to hit one shard. Now if the writes are of low enough frequency – and again, it's always the question of what your access patterns. So you ideally want to think about a shard key that isn't monotonically increasing, but has sufficient cardinality. And those are 2 good basic criteria to think about for your shard key

Choosing a Shard Key

References• https://university.mongodb.com/courses/10gen/M101J/2

015_May/

https://university.mongodb.com/courses/10gen/M101J/2015_May/

https://university.mongodb.com/courses/10gen/M101J/2015_May/

Thanks!

Date post:	15-Apr-2017
Category:	Technology
Upload:	chinh-nguyen
View:	531 times
Download:	2 times

MongoDB for Java Developer- Notes

Technology