+ All Categories
Home > Documents > MongoDB at a Glance

MongoDB at a Glance

Date post: 16-Dec-2015
Category:
Upload: ssailendrakumar2786
View: 35 times
Download: 0 times
Share this document with a friend
Description:
MongoDB
19
MongoDB at a glance An introductory study on what and why MongoDB GE Confidential
Transcript

MongoDB at a glance

MongoDB at a glanceAn introductory study on what and why MongoDBGE ConfidentialRelational DatabaseEarlier in 1970s every business organization used to have their own database structures.In 1969 Edgar F. Codd introduced 12 rules and pioneered relational model of databases.Relational databases follow Normalization and maintain ACID properties.We often think of fact-dimension modelling in case of RDBMS with lot many joins.GE MSAT InternalA little about Big DataGE MSAT Internal

The beginningGoogle started their journey with their search engine in 1996.They tried to index all the websites worldwide.They seemed to face 3 Vs.Volume too largeVelocity too quickVariety too differentTo address these problems and also some other security concerns they came up with:Google File System (GFS)Big TableMapReduceGE MSAT InternalHow Hadoop came into pictureGoogles research paper came out in the market.Doug Cutting from Yahoo took the paper and created a similar framework named Hadoop.Later Apache foundation open sourced the entire ecosystem framework in their Apache Hadoop project.GE MSAT InternalGoogleApache HadoopGFSHDFSBig TableHBaseMapReduceMapReduceHadoop FrameworkIn Hadoop ecosystem we consider cluster of nodes instead of single processor and present that as a single source.Also the data is replicated across different nodes to avoid the probability of loss.Finally when taking data out of Hadoop we assign Mapper task and Reducer task to the individual Task Trackers on the Slave Node.GE MSAT Internal

MongoDBopen sourcedocument orientedhigh performancescalableNoSQLGE MSAT Internal

MongoDBNoSQLBig Data DatabaseWhats NoSQLNoSQL database, also called Not only SQL, is an approach to data management and database design that's for access and analyze very large sets of distributed data.NoSQL

TabularKey-Value StoreDocument-Oriented

No JoinsNo Complex QueriesNo ConstraintsGE MSAT InternalMongo basicsRDBMSMongoDBDatabaseDatabaseTableCollectionRows / RecordsDocumentsValueField : Value pairGE MSAT Internal

Data ModelingModeling in RDBMS:One to OneOne to ManyMany to ManyGE MSAT Internal

Modeling in MongoDB:ReferencesEmbedded DataTo address Relational Database ModelingGE MSAT Internal

{first_name: Paul,surname: Miller,city: London,location: [45.123,47.232],cars: [{ model: Bentley,year: 1973,value: 100000, .},{ model: Rolls Royce,year: 1965,value: 330000, .},]}Added features of MongoDBAd hoc queries (field, range queries, RegEx search)Indexing (any field in document can be indexed)Replication (Master-Slave replication, Master doing read-write & Slave copies data & uses as backup)Duplication of Data (runs over multiple servers)Load balancing (Scales horizontally)Journaling (crash recovery mechanism)Schemaless Structure (collection having documents of different shape and size)Capped collection (maintains insertion order; once the specified size is reached it starts behaving like circular queue)File storage (GridFS available for purpose, used as file system)Aggregation (Pipeline Aggregation, MapReduce)Server-side JavaScript execution (JavaScript in queries)Support Location (understand longitude and latitude natively)GE MSAT InternalShardingSharding is a method for storing data across multiple machines, with no change in the application code, since MongoDB supports horizontal scalability.GE MSAT Internal

GridFSGridFSis a specification for storing and retrieving files that exceed theBSON-documentsize limitof 16MB.GridFSstores files in two collections:chunksstores the binary chunks.filesstores the files metadata.GE MSAT Internal

IndexFor efficient execution of queries.To store a small portion of the collections data set in an easy to traverse form.Defines indexes at the collection level.Supports indexing on any field or sub-field of the document.It follows B-Tree architecture.GE MSAT Internal

ReplicationPrimary (from client drivers read write operation happens on this)Secondary (Secondary's data sets reflect the primarys data set)Arbiter (Arbiters only exist to vote in elections)GE MSAT Internal

Failure RecoveryWhen a primary does not communicate with the other members of the set for more than 10 seconds, the replica set will attempt to select another member to become the new primary.The first secondary that receives a majority of the votes becomes primary.GE MSAT Internal

MapReduceGE MSAT Internal

Thank youGE MSAT Internal


Recommended