Mongo db

transcript

NoSQL Database

Akshay MathurSarang Shravagi

@akshaymathu, @_sarangs

{name: ‘mongo’, type: ‘db’}

@akshaymathu, @_sarangs 2

Who uses MongoDB

Let’s Know Each Other

• Do you code?• OS?• Programing Language?• Why are you attending?

Akshay Mathur

• Managed development, testing and release teams in last 14+ years– Currently Principal Architect at

ShopSocially

• Founding Team Member of– ShopSocially (Enabling “social” for

retailers)– AirTight Neworks (Global leader of WIPS)

Sarang Shravagi

• 10gen Certified Developer and DBA• CS graduate from PICT Pune• 3+ years in Software Product

industry• Currently Senior Full-stack Developer

at ShopSocially

How we use MongoDB

Python MongoDB

MongoEngine

Where MongoDB Fits

Program Outline: Understanding NoSQL

• Data Landscape• Different Storage Needs• Design Paradigm Shift from SQL to

NoSQL• Different Datastores• Closer look to Document Storage• Drawing parallel from RDBMS

Program Outline: Hands on Lab

• Installation and basic configuration• Mongo Shell• Creating and Changing Schema• Create, Read, Update and Delete of Data• Analyzing Performance• Improving performance by creating

Indices• Assignment• Problem solving for the assignment

Program Outline: Advance Topics

• Handling Big Data– Introduction to Map/Reduce– Introduction to Data Partitioning

(Sharding)

• Disaster Recovery– Introduction to Replica set and High

Availability

Ground Rules

• Disturb Everyone– Not by phone rings– Not by local talks– By more information

and questions

Data Patterns & Storage Needs

Data at an Online Store

• Product Information• User Information• Purchase Information• Product Reviews• Site Interactions• Social Graph• Search Index

SQL to NoSQL

Design Paradigm Shift

SQL Storage

• Was designed when– Storage and data transfer was costly– Processing was slow– Applications were oriented more

towards data collection

• Initial adopters were financial institutions

SQL Storage

• Structured– schema

• Relational– foreign keys, constraints

• Transactional– Atomicity, Consistency, Isolation, Durability

• High Availability through robustness– Minimize failures

• Optimized for Writes• Typically Scale Up

NoSQL Storage

• Is designed when– Storage is cheap– Data transfer is fast–Much more processing power is

available• Clustering of machines is also possible

– Applications are oriented towards consumption of User Generated Content

– Better on-screen user experience is in demand

NoSQL Storage

• Semi-structured– Schemaless

• Consistency, Availability, Partition Tolerance

• High Availability through clustering– expect failures

• Optimized for Reads• Typically Scale Out

Different Datastores

Half Level Deep

SQL: RDBMS

• MySql, Postgresql, Oracle etc.• Stores data in tables having columns– Basic (number, text) data types

• Strong query language• Transparent values– Query language can read and filter on

them– Relationship between tables based on

values

• Suited for user info and transactions

NoSQL: Key/Value

• Redis, DynamoDB etc.• Stores a values against a key– Strings

• Values are opaque– Can not be part of query

• Suited for site interactions

NoSQL: Key/Value

NoSQL: Document

• MongoDB, CouchDB etc.• Object Oriented data models– Stores data in document objects having

fields– Basic and compound (list, dict) data types

• SQL like queries• Transparent values– Can be part of query

• Suited for product info and its reviews

NoSQL: Document

NoSQL: Column Family

• Cassandra, Big Table etc.• Stores data in columns• Transparent values– Can be part of query

• SQL like queries• Suited for search

NoSQL: Column Family

NoSQL: Graph

• Neo4j• Stores data in form of nodes and

relationships• Query is in form of traversal• In-memory• Suited for social graph

NoSQL: Graph

Document Storage: Closer Look

MongoDB

• Document database• Powerful query language• Docs, sub-docs, indexes• Map/reduce• Replicas, shards, replicated shards• SDKs/drivers for so many languages

– C, C++, C#, Python, Erlang, PHP, Java, Javascript, NodeJS, Perl, Ruby, Scala

RDBMS: DB Design

RDBMS: Query

RDBMS MongoDB

Database Database

Table Collection

Row Document

Column Field

Select c1, c2 from Table where c1 = ‘v1’ order by c2 limit n

Collection.objects(F1 = ‘v1’).order_by(‘c2’).limit(n)

MongoDB: Design

MongoDB: Query

• Movies.objects()

Have you Installed?

http://www.mongodb.org/downloads

Hands-on

Dive-in with Sarang

MongoDB: Core Binaries

• mongod– Database server

• mongo– Database client shell

• mongos– Router for Sharding

Getting Help

• For mongo shell–mongo –help• Shows options available for running the shell

• Inside mongo shell– Object.help()• Shows commands available on the object

Import Export Tools

• For objects–mongodump–mongorestore– bsondump–mongooplog

• For data items–mongoimport–mongoexport

Database Operations

• Database creation• Creating/changing collection• Data insertion• Data read• Data update• Creating indices• Data deletion• Dropping collection

Diagnostic Tools

• mongostat• mongoperf• mongosnif• mongotop

Assignment

• Go to http://www.velocitainc.com/mongo/– Tasks• assignments.txt

– Data• students.json

Disaster Recovery

Introduction to Replica Sets and

High Availability

Disasters

• Physical Failure– Hardware– Network

• Solution– Replica Sets• Provide redundant storage for High

Availability– Real time data synchronization

• Automatic failover for zero down time

Replication

Multi Replication

• Data can be replicated to multiple places simultaneously

• Odd number of machines are always needed in a replica set

Single Replication

• If you want to have only one or odd number of secondary, you need to setup an arbiter

Failover

• When primary fails, remaining machines vote for electing new primary

Handling Big Data

Introduction to Map/Reduce and Sharding

Large Data Sets

• Problem 1– Performance• Queries go slow

• Solution–Map/Reduce

Map Reduce

• A way to divide large query computation into smaller chunks

• May run in multiple processes across multiple machines

• Think of it as GROUP BY of SQL

Map/Reduce Example

• Map function digs the data and returns required values

Map/Reduce Example

• Reduce function uses the output of Map function and generates aggregated value

Large Data Sets

• Problem 2– Vertical Scaling of Hardware• Can’t increase machine size beyond a limit

• Solution– Sharding

Sharding

• A method for storing data across multiple machines

• Data is partitioned using Shard Keys

Data Partitioning: Range Based

• A range of Shard Keys stay in a chunk

Data Partitioning: Hash Bsed

• A hash function on Shard Keys decides the chunk

Sharded Cluster

Optimizing Shards: Splitting

• In a shard, when size of a chunk increases, the chunk is divided into two

Optimizing Shards: Balancing

• When number of chunks in a shard increase, a few chunks are migrated to other shard

Summary

• MongoDB is good– Stores objects as we use in programming

language– Flexible semi-structured design– Scales out to store big data– Embedded documents eliminates need for join

• MongoDB is bad– No multi-document query– De-normalized storage– No support for transactions

Thanks

@akshaymathu @_sarangs

Mongo db

Technology