Aggregation Indexing

Aggregation and Indexing

AggregationAggregations operations process data records and return

computed results. Aggregation operations group values from multiple

documents together, and can perform a variety of operations on the grouped data to return a single result.

Like queries, aggregation operations in MongoDB use collections of documents as an input and return results in the form of one or more documents.

MongoDB provides three ways to perform aggregation:– Aggregation pipeline– Map-reduce function and – Single purpose aggregation methods and commands

1.Aggregation PipelineThe aggregation pipeline is a framework for performing

aggregation tasks, modeled on the concept of data processing pipelines.

Using this framework, MongoDB passes the documents of a single collection through a pipeline.

The pipeline transforms the documents into aggregated results, and is accessed through the aggregate database command.

1.Aggregation Pipeline Contd…

Pipeline:The MongoDB aggregation pipeline starts with the documents

of a collection and streams the documents from one pipeline operator to the next to process the documents.

Each operator in the pipeline transforms the documents as they pass through the pipeline.

Pipeline operators do not need to produce one output document for every input document.

Operators may generate new documents or filter out documents. Pipeline operators can be repeated in the pipeline.

The db.collection.aggregate() method returns a cursor and can return result sets of any size.



Pipeline Expressions: Each pipeline operator takes a pipeline expression as its

operand. Pipeline expressions specify the transformation to apply to

the input documents. Expressions have a document structure and can contain

fields, values, and operators.Pipeline expressions can only operate on the current

document in the pipeline and cannot refer to data from other documents: expression operations provide in-memory transformation of documents.


Pipeline Operators Pipeline operators appear in an array. Documents pass through the

operators in a sequence.


Expression Operators

Limitations of Aggregation Pipeline Type Restrictions

Can’t operate on Symbol, MinKey, MaxKey, DBRef, Code, and CodeWScope.

Result Size Restrictions The aggregate command will produce an error if the result set exceeds the BSON

Document Size limit, which is currently 16 megabytes. The aggregate command can return results as a cursor or store the results in a

collection, which are not subject to the size limit. The db.collection.aggregate() returns a cursor and can return result sets of any

size.

Memory Restrictions Pipeline stages have a limit of 100 megabytes of RAM. If a stage exceeds this

limit, MongoDB will produce an error. To allow for the handling of large datasets, use the allowDiskUse option to enable

aggregation pipeline stages to write data to temporary files.

2.Map ReduceMap-reduce is a data processing paradigm for condensing

large volumes of data into useful aggregated results.

For map-reduce operations, MongoDB provides the mapReduce database command.

All map-reduce functions in MongoDB are JavaScript and run within the mongod process.

Map-reduce operations take the documents of a single collection as the input and can perform any arbitrary sorting and limiting before beginning the map stage.

2.Map Reduce contd…

In this map-reduce operation, MongoDB applies the map phase to each input document (i.e. the documents in the collection that match the query condition). The map function emits key-value pairs.

For those keys that have multiple values, MongoDB applies the reduce phase, which collects and condenses the aggregated data.

MongoDB then stores the results in a collection. Optionally, the output of the reduce function may pass through a finalize function to further condense or process the results of the aggregation.



Map-Reduce JavaScript Functions In MongoDB, map-reduce operations use custom

JavaScript functions to map, or associate, values to a key. If a key has multiple values mapped to it, the operation

reduces the values for the key to a single object.The use of custom JavaScript functions provide:flexibility to map-reduce operations.to make final modifications to the results at the end of the

map and reduce operation, such as perform additional calculations.


Map Reduce Concurrency The map-reduce operation is composed of many tasks like: including reads from the input collection, executions of the map function, executions of the reduce function, writes to a temporary collection during processing, and writes to the output

collection.

During the operation, map-reduce takes the following locks: The read phase takes a read lock. It yields every 100 documents. The insert into the temporary collection takes a write lock for a single write. If the output collection does not exist, the creation of the output collection

takes a write lock. If the output collection exists, then the output actions (i.e. merge, replace,

reduce) take a write lock. This write lock is global, and blocks all operations on the mongod instance.

3.Single Purpose Aggregation OperationsFor a number of common single purpose aggregation

operations, MongoDB provides special purpose database commands.

These common aggregation operations are: returning a count of matching documents, Returning the distinct values for a field, and grouping data based on the values of a field.

All of these operations aggregate documents from a single collection.

While these operations provide simple access to common aggregation processes, they lack the flexibility and capabilities of the aggregation pipeline and map-reduce.

3.Single Purpose Aggregation Operations

3.Single Purpose Aggregation OperationsCount

MongoDB can return a count of the number of documents that match a query.

The count command as well as the count() and cursor.count() methods provide access to counts in the mongo shell.

3.Single Purpose Aggregation OperationsExample of Count Given a collection named records with only the following

documents:{ a: 1, b: 0 }{ a: 1, b: 1 }{ a: 1, b: 4 }{ a: 2, b: 2 }

db.records.count()The operation would count all documents in the collection and return the number 4.

db.records.count( { a: 1 } )The operation will count only the documents where the value of the field a is 1 and return 3.

3.Single Purpose Aggregation OperationsDistinct

The distinct operation takes a number of documents that match a query and returns all of the unique values for a field in the matching documents.

The distinct command and db.collection.distinct() method provide this operation in the mongo shell.

3.Single Purpose Aggregation OperationsExample of Distinct Given a collection named records with only the following

documents:{ a: 1, b: 0 }{ a: 1, b: 1 }{ a: 1, b: 1 }{ a: 1, b: 4 }{ a: 2, b: 2 }{ a: 2, b: 2 }

db.collection.distinct() operation which returns the distinct values of the field b:

db.records.distinct( "b" ) The results of this operation would resemble: [ 0, 1, 4, 2 ]

3.Single Purpose Aggregation OperationsGroupThe group operation takes a number of documents that

match a query, and then collects groups of documents based on the value of a field or fields.

It returns an array of documents with computed results for each group of documents.

Access the grouping functionality via the group command or the db.collection.group() method in the mongo shell.

Group does not support data in sharded collections. In addition, the results of the group operation must be no larger than 16 megabytes.

3.Single Purpose Aggregation OperationsExample of Group Given a collection named records with the following

documents:

{ a: 1, count: 4 }{ a: 1, count: 2 }{ a: 1, count: 4 }{ a: 2, count: 3 }{ a: 2, count: 1 }{ a: 1, count: 5 }{ a: 4, count: 4 }

3.Single Purpose Aggregation OperationsExample of GroupGroup operation which groups documents by the field a, where a is less than

3, and sums the field count for each group:db.records.group( {key: { a: 1 },cond: { a: { $lt: 3 } },reduce: function(cur, result) { result.count += cur.count },initial: { count: 0 }} )

The results of this group operation would resemble the following:[{ a: 1, count: 15 },{ a: 2, count: 4 }]

IndexingIndexes provide high performance read operations for

frequently used queries.Indexes support the efficient execution of queries in

MongoDB.Without indexes, MongoDB must scan every document in

a collection to select those documents that match the query statement.

These collection scans are inefficient because they require mongod to process a larger volume of data than an index for each operation.

MongoDB defines indexes at the collection level and supports indexes on any field or sub-field of the documents in a MongoDB collection.

IndexingAll MongoDB collections have an index on the _id field

that exists by default. If applications do not specify a value for _id the driver or the mongod will create an _id field with an ObjectId value.

The _id index is unique, and prevents clients from inserting two documents with the same value for the _id field.

Type of Indexing1. Single Field Indexes

2. Compound Indexes

3. MultiKey Indexes

4. Geospatial Indexes

5. Text Indexes

6. Hashed Indexes

1.Single Field IndexesSingle field index only includes data from a single field of the

documents in a collection.Example: Collection: user

{"_id": ObjectId(...)"name": "John Doe“"address": {"street": "Main","zipcode": "53511","state": "WI"} }

A.Index on Single Field:db.collection_name.ensureIndex( { “field” : 1 } )

B.Index on Embedded Field:db.user.ensureIndex( { "address.zipcode": 1 } )

2.Compound IndexesA Compound index includes more than one field of the

documents in a collection.Single index structure holds references to multiple fields

within a collection’s documents.The order of the fields in a compound index is very important.Example:

{"_id": ObjectId(...),"item": "Banana","category": ["food", "produce", "grocery"],"location": "4th Street Store","stock": 4,"type": "cases",

}Compound Index on item and stock is:

db.products.ensureIndex( { "item": 1, "stock": 1 } )

2.Compound Indexes contd…

Sort Order: The order of the fields in a compound index is very important. Indexes store references to fields in either ascending (1) or

descending (-1) sort order. For single-field indexes, the sort order of keys doesn’t matter

because MongoDB can traverse the index in either direction. However, for compound indexes sort order can matter in

determining whether the index can support a sort operation.

db.events.find().sort( { username: 1, date: -1 } )db.events.find().sort( { username: -1, date: 1 } )

The following index can support both these sort operations:db.events.ensureIndex( { "username" : 1, "date" : -1 } )

3.Multikey IndexesMongoDB uses multikey indexes to index the content stored in

arrays. If you index a field that holds an array value, MongoDB creates

separate index entries for every element of the array. These multikey indexes allow queries to select documents that

contain arrays by matching on element or elements of the arrays.MongoDB automatically determines whether to create a

multikey index if the indexed field contains an array value; you do not need to explicitly specify the multikey type.

db.user.ensureIndex( { “subject.dmsa": 1 } )

4.Text IndexesText indexes supports search of string content in documents.

Text indexes can include any field whose value is a string or an array of string elements. To perform queries that access the text index, use the $text query operator.

To create a text index, use the db.collection.ensureIndex() method.

To index a field that contains a string or an array of string elements, include the field and specify the string literal "text" in the index document.

db.reviews.ensureIndex( { comments: "text" } )

Index PropertiesTTL(Time To Live) Indexes: The TTL index is used for TTL collections, which expire data

after a period of time.

TTL indexes are special indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time.

This is ideal for some types of information like machine generated event data, logs, and session information that only need to persist in a database for a limited amount of time.

Index PropertiesUnique Indexes: A unique index causes MongoDB to reject all documents that

contain a duplicate value for the indexed field.

To create a unique index, use the db.collection.ensureIndex() method with the unique option set to true.

By default, unique is false on MongoDB indexes.db.members.ensureIndex( { "user_id": 1 }, { unique: true } )

Drop Duplicates Force MongoDB to create a unique index by deleting documents with

duplicate values when building the index.

db.collection.ensureIndex( { a: 1 }, { unique: true, dropDups: true } )

Index PropertiesSparse Indexes: A sparse index does not index documents that do not have the

indexed field. i.e Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value.

The index is “sparse” because it does not include all documents of a collection.

To create a sparse index, use the db.collection.ensureIndex() method with the sparse option set to true.

db.addresses.ensureIndex( { “user_id": 1 }, { sparse: true } )

Remove IndexTo remove an index from a collection use the dropIndex()

method and the following procedure.

Remove a Specific Index

db.accounts.dropIndex( { “user_id": 1 } )

Remove All Indexes You can also use the db.collection.dropIndexes() to remove all

indexes, except for the _id index from a collection.

Rebuild IndexesIf you need to rebuild indexes for a collection you can use the

db.collection.reIndex() method to rebuild all indexes on a collection in a single operation.

This operation drops all indexes, including the _id and then rebuilds all indexes.

Thank You

Date post:	09-Apr-2016
Category:	Documents
Upload:	mahesh-malwade
View:	249 times
Download:	0 times