SDJTeaser_5_12

editor’s note

2 4/2012 en.sdjournal.org4/2012 en.sdjournal.org

For developers by developers

Dear Readers!

In this issue you’ll find loads of information and inspi-ration because we write about very interesting topic – MongoDB.

MongoDB (from “humongous”) is an open source document-oriented NoSQL database system written by C++. It is characterized by high scalability, performance and the lack of a clearly defined structure of the support-ed databases. In this issue you’ll find six fantastic arti-cles:

After reading the first article written by Krishnach-ytanya Ayyagari you will be able to say No to SQL.

NoSQL databases won’t replace relational databases, but instead will become a better option for certain types of projects. People will learn to look at their data and be able to choose from many databases for many needs. There will be a growing realization that the relational databases in use today are often good tools but that other tools have their place as well.

In the article entitled: “MongoDB emerges as a NoSQL leader”, written by Ric Johnson we discussed core topics of MongoDB with technical and administrative point of view. The basic objective of this article is to impart knowl-edge about huge data storage that can easily scale your data with support of replication.

From the third article MongoDB for an Open-Data Portal written by Stefan Edlich, Marc Boekera and Sonam Singh you’ll learn inter alia: why MongoDB is the leading NoSQL Database, the diversity of features and APIs this database offers, code Examples that show how to inter-act with the database and, finally, why MongoDB has a lot of tools to ensure production use without much ef-fort.

Shane R. Spencer written Advanced atomic batch in-formation processing article. From article you also will learn: unique MongoDB batch processing techniques, traditional batch processing techniques, MongoDB atomic modifications and RDBMS atomic gotchas.

In the next MongoDB article written by Muhammad Idrees you can read and learn about: a basic idea about what MongoDB is, some cool features revealed by this No SQL database, introduction to the client shell and getting used with basic database operations and data-types supported by MongoDB.

Dileepa Jayathiloha, Ashan Fernando and Charith Sooriyaarachchi written artile Thinking Big to Deal with Big Data. A Practical Insight into MongoDB. Document databases in general, and MongoDB in particular, comes very handy when attacking problems where organized data with little or no schema need to be dealt with.

I would like to thank our great experts and specialists in MongoDB fields, thanks to them we can publish Mon-goDB issue today.

Angelika Gucwa and SDJ team.

Managing: Angelika [email protected]

Senior Consultant/Publisher: Paweł Marciniak

Editor in Chief: Grzegorz [email protected]

Art Director: Patrycja Przybył[email protected]

DTP: Patrycja Przybyłowicz

Production Director: Andrzej [email protected]

Marketing Director: Angelika [email protected]

Proofreadres: Michael Munt, Nick Baronian, Dan Dieterle, Patrik Gange, Aby Rao, Jeffrey Smith

Betatesters: Paweł Brzęk, Francesco Consiglio, Keith DeBus, Demazy Mbella, Matteo Massaro, Arthur Tumanyan

Publisher: Hakin9 Media Sp. z o.o. SK02-682 Warszawa, ul. Bokserska 1www.en.sdjournal.org

Whilst every effort has been made to ensure the high quality of the magazine, the editors make no warranty, express or implied, concerning the results of content usage.

All trade marks presented in the magazine were used only for informative purposes.All rights to trade marks presented in the maga-zine are reserved by the companies which own them.

To create graphs and diagrams we used programby Mathematical formulas created by Design Science MathType™

DISCLAIMER!The techniques described in our articles may onlybe used in private, local networks. The editorshold no responsibility for misuse of the presentedtechniques or consequent data loss.

4/2012 en.sdjournal.org

table of Content


MongodBsay no to sQLby Krishnachytanya Ayyagari

As the tile says, let’s say No to SQL. Ok, this simple statement triggers many questions to development community. To quote a few of them we have Why do we need to say that? What motivates us to say so? What are the reasons? What benefits we have if we say this? What is the relation between this agenda and MangoDB?… and a bunch more. In this article we will discuss exactly about above questions and will have a detailed survey of databases that are using NoSQL along an overview of NoSQL databases.

MongodB emerges as a nosQL Leaderby Ric Johnson

In 2007, Eliot Horowitz and the 10gen team started with a concept. They wanted to engineer a tool that would combine the best features of traditional, relation-al databases and make them work in a distributed plat-form designed to combine elasticity, scalability and easy administration in a way tailored for modern web applica-tions. The concept evolved into MongoDB.

MongodB for an open-data Portalby Stefan Edlich, Marc Boeker, Sonam Singh

Besides Hadoop, MongoDB is the leading NoSQL Da-tabase because it is feature rich and fast responding to the community. We choose MongoDB to build an Open-Data Platform / a Market-Place for Data. In this article we introduce MongoDB with all its features and we investi-gate, how these features are useful for our needs. Practi-cal experiences in creating and running such a platform will be presented along with outstanding new features MongoDB recently introduced.

Advanced aAomic batch information Pro-cessingby Shane R. Spencer

Databases can be seen as reliable work queues when information that is inserted into them needs to be pro-cessed again in some way regardless of how quickly or how often. The most common form of post processing is when information needs to be migrated from one data-base to another as a very simple synchronization to help with distributing load, performing a backup, or separat-ing the information into more focused sets.

MongodBby Muhammad Idrees

MongoDB makes part of the “new” NoSQL family of database systems. Instead of storing data in tables as is done in a “classical” relational database, MongoDB stores

structured data as JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster. MongoDB is an open source, non-relational database system designed to meet the needs of modern Web 2.0 applications. Extensive built-in support for MapReduce-style aggregation and geospatial indexes to aggregate and query data more easily. MongoDB has a developer-friendly data model, administrator-friendly configuration options, and natural-feeling language APIs provided by drivers and the database shell.

thinking Big to deal with Big data : A Practi-cal insight into MongodBby Dileepa Jayathiloha, Ashan Fernando, Charith Soori-yaarachchi

NoSQL databases have become a popular topic among enterprise data architectures on web and cloud world. MongoDB is one of the most popular open source pillars in this NoSQL family. NoSQL databases can be cat-egorized into four classes: key-value, big table, docu-ment-oriented and graph; MongoDB falls under docu-ment-oriented databases. This article presents a practi-cal insight into MongoDB while focusing on a case study where we detail the technical solution we implemented using MongoDB for a commercial problem. How the problem was attacked utilizing strengths in MongoDB is comprehended along with a comparison with RDBMS and other NoSQL models. We also provide a pragmatic guide on when and where to use MongoDB.

MongodB


say no to sQL

As the tile says, let’s say No to SQL. Ok, this simple statement triggers many questions to development community.

Mr.Sql database says…. Mr.NoSql database says….

I have a fixed Layout. The structure of data in a relation-al database is predefined by the layout of the tables and the fixed names and types of the columns that makes me more organized

That what makes you difficult to maintain buddy

Ok, Users can scale a relational database by running it on a more powerful computers

Exactly…. more powerful and expensive. And to scale beyond a certain point, though, you must be distribut-ed across multiple servers. Also you don’t work easily in a distributed manner because joining your tables across a distributed system is difficult, as said by my friend, Craigslist software engineer Jeremy Zawodny.

Why can’t he then distribute me across different proces-sors and work

You aren’t designed to function with data partitioning, so distributing your functionality is a chore. You can ask this Stephen O’Grady, an analyst with market research firm RedMonk.

Fine then, what about the fact that I can withstand com-plex data and give users a flexibility to interact with them

With you, users must convert all data into tables. When the data doesn’t fit easily into a table, your structure can be complex, difficult, and slow to work with.

Then use SQL. Using SQL is convenient with structured data.

As you said using SQL language with other types of in-formation is difficult because it’s designed to work with structured, relationally organized databases with fixed table information, explained Stefan Edlich, professor at the Beuth University of Applied Sciences in Berlin. How-ever, SQL can entail large amounts of complex code and doesn’t work well with modern, agile development, he said.

I offer a big feature set and data integrity. Yes I agree, but the problem here is database users of-ten don’t need all the features, as well as the cost and complexity they add.

To quote a few of them we have Why do we need to say that? What motivates us to say so? What are the reasons? What benefits we have if we say this? What is

the relation between this agenda and MangoDB?… and a bunch more.

In this article we will discuss exactly about above questions and will have a detailed survey of databases that are using NoSQL along an overview of NoSQL data-bases.

What is the Concept of nosQL The concept described by the term NoSQL

means a database system, which is distributed, may not require fixed table schemas, usually avoids join opera-tions, typically scales horizontally, does not expose a SQL interface and may be open source.

Now before we kick start into the topic let us see what two geeks Mr.Sql and Mr.NoSql are discussing about in Table1. (Read left to right)

table 1 : Conversation of our SQL geeks (Read left to right)


http://www.allthings-software.com

MongodB


MongodB emergesas NoSQL LeaderIn 2007, Eliot Horowitz and the 10gen team started with a concept. They wanted to engineer a tool that would combine the best fea-tures of traditional, relational databases and make them work in a distributed platform designed to combine elasticity, scalability and easy administration in a way tailored for modern web applications. The concept evolved into MongoDB.

Unique in a field of new NoSQL databases, Mon-goDB is rooted in Binary JSON, a lightweight JavaScript-based data exchange format designed

to be easily traversable and efficient in encoding and de-coding. MongoDB is well suited to cloud applications be-cause of its document-oriented data model. It achieves speed and manageability through the use of embedded docs and allows for easy horizontal scalability because of its reduced reliance on joins. Its schema-free database also serves to create increased development agility. These unique features, combined with recent partner-ships with high-profile, large-volume users like Craigslist, MTV, and Disney have catapulted MongoDB into the forefront of NoSQL technology. Featuring index perfor-mance enhancements, new querying and Shell features, and a host of other upgrades in its March 2012 release, MongoDB is a robust, open-source database platform characterized by continuous improvements and cutting edge technological advances.

If you want to avail the opportunity to interact with a highly optimized database that provides full accessibili-ty of agile and scalable development in an open source environment then you need to delve into MongoDB which is high performance document based NOSQL da-tabase that allows users to store structured data as JSON-like documents with dynamic schemas. The integration of data with other applications made it distinguishable in terms of functionality and support. The goal of Mon-goDB is to bridge the gap between key-value stores and relational databases.

MongoDb development commenced by 10gen in 2007 and in 2009 it emerged as an open source, NoSQL product with an AGPL license. It was created by former DoubleClick Founder and CTO Dwight Merriman and former DoubleClick engineer and ShopWiki Founder and CTO Eliot Horowitz. They collaborated their vision and experience developing large scale, highly robust sys-

tems to create an innovative kind of database which in-herits various features of relational database like the con-cept of indexes and dynamic queries. The ideology is changed from relational to document based database which extends several other features of improved agility through flexible schemas.

The prominent feature of the MongoDB data model is a simplified coding structure that improves performance of grouping data and also helps developers to map ob-ject-oriented language in the absence of an ORM layer. It increases the productivity with a flexible document model. MongoDB is specifically designed to work with commodity servers in an elastic virtualized environment to save cost with data reliability.

the reason for Using MongodBIn the lineage of communication where information is

flowing so rapidly organizations need a sustainable and durable database which can grow with time, execute faster development and enable flexible deployment. MongoDB is a highly optimized document based data-base that engage their clients to provide built-in support for horizontal scalability also and facilitates users to man-age their applications in no time. MongoDB has been de-signed to cater to BigData - if your database is running on a single server then you will reach a scaling limit whereas MongoDB scales by adding more servers and is able to add more capacity whenever you want. It entails the concept of robust technology as it fully supports consistency and transactional updates. Data integrity is guaranteed through journaling and replication. Auto sharding is also the one of the most recommendable op-tions which allows users to distribute data across multi-ple nodes. Replica sets give high availability with auto-matic failover and recovery of database nodes within or across data centers.

MongodB


MongodBfor an Open-Data Portal

Besides Hadoop, MongoDB is the leading NoSQL Database because it is feature rich and fast responding to the community. We choose MongoDB to build an Open-Data Platform/a Market-Place for Data. In this article we introduce MongoDB with all its features and we investigate, how these fea-tures are useful for our needs. Practical experiences in creating and running such a platform will be presented along with outstanding new features MongoDB recently introduced.

MongodB on its Way to the topIn a current research project at Beuth University of

Technology (App.Sc.) Berlin, we had to develop a new and innovative Open-Data Platform / a Marketplace for Data. Thus we had to evaluate all database solutions in the market so far and choose MongoDB, because of its unmatched set of innovative features. In the following text we want to outline this features and how they foster our requirements for an Open-Data Platform.

Select any statistic about NoSQL and you see Mon-goDB on one of the first places. Perhaps together with its strongest competitors Hadoop, Redis or Cassandra. Nev-ertheless in earlier versions, we also knew that MongoDB had some issues concerning durability and the scaling architecture, which is not based on consistent hashing. But MongoDB has an incredible open development pro-cess with a public Jira instance and carefully listens to customers.

But back to the roots. Being created in C++ by Dwight Merriman and Eliot Horowitz for some Web-Shops like ShopWiki.com, MongoDB now has one of the largest in-stallation base in the world with far over 1000 remark-able sites as SourceForge, Craigslist, SAP, Eventbrite, Springer, Cern, github, Grooveshark, The New York Times and many more. But the more important point is that there are over 100 MongoDB hosters and MongoDB is creeping to become a standard for PaaS platforms to-gether with Redis. And there must be a reason why. One is for sure that MongoDB is moving fast in its versions and maturity and another is also that there are at least al-ready eight books available by O’Reilly’, Manning, Apress and more.

Another important point is that MongoDB [1] does not feel completely different to developers having expe-riences with MySQL:

Basic- / Unique- / Compund-Indexes•Transactions in terms of Atomic updates•Stored Procedures in terms of Server Side •JavaScript executionCursors•Views in Terms of stored MapReduce collections•Replication•lots of Web-Frontends•distributed binary storage in terms of using Grid-•FS

Even some kind of triggers are easily possible if you trace the MongoDB logs. So not many features will be missed as e.g. real ACID.

Up and runningThe MongoDB installation is a matter of minutes and

can’t be easier. Data is organized in the following way:Mongo-Instance x Database x Collection x Document

The documents in MongoDB are stored using JSON [1]. Internally they are transferred via the BSON [2]. JSON are nested key-value pairs with the possibility of using nested objects and arrays. MongoDB comes with a shell (mongo) and you have to take care of not mixing up the database and the collection here, because you simply will.

You start the server with some typical options like mongod --dbpath F:\DATABASES\open-data -v –rest.

MongodB


Advanced Atomic batchinformation ProcessingMongoDB has been major interest for him over the past year and will continue to be part of several professional and personalprojects. Recently an obsession has formed with how toeffectively and efficiently allocate documents for parallel batch processing using atomic operations within MongoDB documents.

typical Cases for Post Processing informa-tion

Databases can be seen as reliable work queues when information that is inserted into them needs to be pro-cessed again in some way regardless of how quickly or how often.

The most common form of post processing is when information needs to be migrated from one database to another as a very simple synchronization to help with distributing load, performing a backup, or separating the information into more focused sets.

For just about any long term database project ar-chiving information becomes a necessity as well. Mov-ing information away from active databases into archive databases often involves a lot of verification checks to ensure the information was copied properly before it is removed from the active database. This is a form of post processing that involves several steps and potentially re-quires extra fields on both the active and archive data-base in order to mark the information as having passed verification and when the information was migrated.

Software developers interested in full text indexing find that effective post processing allows them to check documents before submitting them to indexing servers. This is similar to migrating information from one data-base to another.

Call centers that require analytics on call volume data often separate information into multiple databases in or-der to create secure work environments for a specific in-formation set. When recording of telephone conversa-tions is required it is important to know if a recording ex-ists on disk that matches one of the fields in the call record information. This is an example of when post pro-cessing looks for extra information outside of the data-base and possibly fills in a few fields before marking the information as processed. Recordings are typically com-pressed further as well which may be part of the post-processing for each call record before the call can be submitted to another database or even shown in search results.

batch vs. individual information ProcessingWhen a database requires any level of post-process-

ing an extra field is used to denote if or when specific in-formation had been processed. This field is typically in-dexed to speed up the selection process when looking for unprocessed information within a larger set. Either approach of processing information in batches or as in-dividual rows or documents takes advantage of this pro-cessing field and the inherited atomicity that comes with most databases to create a lock to keep information from being processed more than once in both serial and par-allel post-processing environments.

Individual information processing is simple and straightforward and is often the first step when database administrators and software developers start attempt-ing to add post-processing to databases. What they soon find out is that that individual queries for a single unlocked piece of information as well as individual re-quests to handle locking is highly inefficient when at-tempting to process information as fast as possible. It is important to understand that each individual database request forces the database server to parse the request, queue it up for processing, and then start at the begin-ning of the index when finding the information to pro-cess.

By choosing to lock a set of information rather than a single piece of information the amount database re-quests and subsequent index scans are reduced sub-stantially.

Simplified batch Information ProcessingA common practice of batch processing involves a

batch processing program that simply selects a limited set of information that doesn’t have a boolean field set to `true` which marks it as batch processed. (Listing 1)

Once the batch processor is done with each selected piece of information it updates the database again to change the `batch_processed` mark. (Listing 2)

This technique is very simple and very fast but it lacks scalability. Another reason to stay away from this is that

MongodB


MongodBMongoDB makes part of the “new” NoSQL family of data-base systems. Instead of storing data in tables as is done in a “classical” relational database, MongoDB stores structured data as JSON-like documents with dynamic schemas (MongoDB calls the format BSON), making the integration of data in certain types of applications easier and faster.

MongoDB is an open source, non-relational data-base system designed to meet the needs of modern Web 2.0 applications. Extensive built-in

support for MapReduce-style aggregation and geospa-tial indexes to aggregate and query data more easily. MongoDB has a developer-friendly data model, admin-istrator-friendly configuration options, and natural-feel-ing language APIs provided by drivers and the database shell. Not only that, MongoDB has several unique fea-tures, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense. A single MongoDB node is able to comfortably serve 1000s of requests per second on cheap hardware. When you need to scale beyond that, you can use either replication (keeping several copies of the data on differ-ent servers) or sharding (partitioning the data across servers). MongoDB even includes logic to automatically load-balance your shards as your database and load in-crease.

Let’s take a look at some of its main features that make it a good choice :

MongoDB is well suited to handle large volumes •of data. Situations arises where traditional rela-tional database system becomes too expensive in terms of system resources (for example time and space, definitely the large volumes of the data need more processing), MongoDB could become a better alternative. MongoDB supports asynchronous insert opera-•tions. So the application code asks MongoDB to insert a document and moves on to the next task without waiting for the server to respond. This frees the application to do its task without being stuck to one long database operation, and so en-hance user responsiveness. This makes it an excel-lent tool for logging. For example, your website can process an HTTP request , logs various details of the request (for example the time, user agent, cookies information etc.) in the database, and then generate the output. Since the insertion is

asynchronous, the output generation is contin-ued without delay.MapReduce is an approach to data processing •which has two significant benefits over other tra-ditional solutions. The first, and main, reason it was development is performance. The second benefit of MapReduce is that, you can write real code to do your processing. MapReduce code is substantially richer and let you endorse a good technique before you go for a more specialized solution. This is a powerful and flexible tool for data processing and may be considered as anoth-er useful feature also with asynchronous behav-ior.

extensive data ModelMongoDB is a document-oriented database, as op-

posed to a relational one. The primary reason for moving away from the relational model is to make scaling out easier, makes your application to scale with little effort. Apart from scaling, there are many other advantages as well.

The basic idea is to replace the concept of a “row” with a more flexible model, the “document”. Each collec-tion (table in relation database) has set of documents (think of documents as rows in relations databases) . By allowing embedded documents and arrays, the docu-ment-oriented approach makes it possible to represent complex hierarchical relationships with a single record. This fits very naturally into the way developers in mod-ern object-oriented languages think about their data ob-jects. Developers directly mapped their objects con-cepts in programming language to the database-level, and has to think less about how to save their object’s state in the data-store and how to retrieved it back to ob-ject state. So developer-friendly rich data model will en-hance development speed and simplifies design com-plexities to communicate and implement with great ease.

MongodB


This article presents a practical insight into Mon-goDB while focusing on a case study where we de-tail the technical solution we implemented using

MongoDB for a commercial problem. How the problem was attacked utilizing strengths in MongoDB is compre-hended along with a comparison with RDBMS and other NoSQL models. We also provide a pragmatic guide on when and where to use MongoDB.

Historical BackgroundMongoDB was developed as a part of a PaaS service

product by 10gen, which is similar to Google app engine. In year 2007 10gen started development of the Mon-goDB inside 10gen app engine. But in 2008, they decid-ed to separate database part from app engine and make it open source. This was a milestone for MongoDB be-cause it started to get users, proving to be a successful product.

Why MongodB?

schema freeSchema in MongoDB is very different from schema in

RDBMS. It can be considered as a schema-free database, which means different data structures can be stored in the same collection.

Agile developmentAgile development is used by many software projects

today. Agile process promotes short duration and itera-tive development life cycles. Using RDBMS in an agile project is not practical at all times because agile nature often introduces changes. As discussed above, Mon-goDB is schema free. This specialty is best for agile devel-opment, because schema changes happen due to re-quirement changes. Rapid database schema changes are no more a problem with the use of a schema free data-base like MongoDB.

Flexible documentsUnlike other databases, especially RDBMS, MongoDB

stores data in documents. Data entities in RDBMS are ‘flat’, but MongoDB documents can contain composite fields such as arrays and hashes. MongoDB documents are stored as JSON objects. More appropriately, the stor-age form is binary JSON, which the MongoDB communi-ty calls BSON. Capacity of a single document is limited to 16 MB in the current release and will be increased in fu-ture. # simple mongoDB documentvar data = {name: “charith “, company: “99X Tech-nology”};db.employers.save(data);

Cloud readyMongoDB is ready to run on commodity hardware,

virtualized environments and the cloud. Database is able to expand with whatever hardware present.

High performanceMongoDB has no acknowledgement for data writes.

This is very important when writing big data into a serv-er. Rather than costly “join”s it uses embedding, which makes read write fast. Indexing enhances query perfor-mance. MongoDB supports indexing, even indexing of keys from embedded documents and arrays.

Horizontally scalable When data size keeps growing, new types of com-

plexities emerge. Solution for this with most technolo-gies like RDMS is vertical scaling by buying bigger serv-ers. MongoDB is horizontally scalable which means data scalability is possible by adding multiple servers. Advan-tage here is the lack of the need for upgrading servers when data set gets bigger. The problem can be dealt with by incrementally introducing suitable computing platforms.

thinking Big to deal with Big data A Practical Insight into MongoDBNoSQL databases have become a popular topic among enterprise data architectures on web and cloud world. MongoDB is one of the most popular open source pillars in this NoSQL family. NoSQL databases can be categorized into four classes: key-value, bigtable, document-oriented and graph; MongoDB falls underdocument-oriented databases.


We build software products because it’s our passion. We are focused, because we understand product engineering is vastly different to bespoke application development. We have a strong focus on the art and science of product engineering and our pride is to see your product winning the market place. We serve established as well as startup ISVs who seek a better outcome and not just software development.

www.facebook.com/99XTechnology

www.twitter.com/99XTechnology

www.99XTechnology.com

Competent professionals, lean processes and tight collaboration are our weapons for beating time!

http://www.99XTechnology.com

http://www.alphafive.com

http://www.gosyntactix.com

Date post:	12-Oct-2014
Category:	Documents
Upload:	kcmagik
View:	880 times
Download:	2 times

SDJTeaser_5_12

Documents