+ All Categories
Home > Technology > Mongo db

Mongo db

Date post: 27-Jan-2015
Category:
Upload: akshay-mathur
View: 2,295 times
Download: 2 times
Share this document with a friend
Description:
MongoDB is a popular NoSQL database. This presentation was delivered during a workshop. First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases. Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less. At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.
Popular Tags:
66
NoSQL Database Akshay Mathur Sarang Shravagi @akshaymathu, @_sarangs {name: ‘mongo’, type: ‘db’}
Transcript
Page 1: Mongo db

NoSQL Database

Akshay MathurSarang Shravagi

@akshaymathu, @_sarangs

{name: ‘mongo’, type: ‘db’}

Page 2: Mongo db

@akshaymathu, @_sarangs 2

Who uses MongoDB

Page 3: Mongo db

@akshaymathu, @_sarangs 3

Let’s Know Each Other

• Do you code?• OS?• Programing Language?• Why are you attending?

Page 4: Mongo db

@akshaymathu, @_sarangs 4

Akshay Mathur

• Managed development, testing and release teams in last 14+ years– Currently Principal Architect at

ShopSocially

• Founding Team Member of– ShopSocially (Enabling “social” for

retailers)– AirTight Neworks (Global leader of WIPS)

Page 5: Mongo db

@akshaymathu, @_sarangs 5

Sarang Shravagi

• 10gen Certified Developer and DBA• CS graduate from PICT Pune• 3+ years in Software Product

industry• Currently Senior Full-stack Developer

at ShopSocially

Page 6: Mongo db

@akshaymathu, @_sarangs 6

How we use MongoDB

Python MongoDB

MongoEngine

Page 7: Mongo db

@akshaymathu, @_sarangs 7

Where MongoDB Fits

Page 8: Mongo db

@akshaymathu, @_sarangs 8

Program Outline: Understanding NoSQL

• Data Landscape• Different Storage Needs• Design Paradigm Shift from SQL to

NoSQL• Different Datastores• Closer look to Document Storage• Drawing parallel from RDBMS

Page 9: Mongo db

@akshaymathu, @_sarangs 9

Program Outline: Hands on Lab

• Installation and basic configuration• Mongo Shell• Creating and Changing Schema• Create, Read, Update and Delete of Data• Analyzing Performance• Improving performance by creating

Indices• Assignment• Problem solving for the assignment

Page 10: Mongo db

@akshaymathu, @_sarangs 10

Program Outline: Advance Topics

• Handling Big Data– Introduction to Map/Reduce– Introduction to Data Partitioning

(Sharding)

• Disaster Recovery– Introduction to Replica set and High

Availability

Page 11: Mongo db

@akshaymathu, @_sarangs 11

Ground Rules

• Disturb Everyone– Not by phone rings– Not by local talks– By more information

and questions

Page 12: Mongo db

@akshaymathu, @_sarangs

Data Patterns & Storage Needs

Page 13: Mongo db

@akshaymathu, @_sarangs 13

Data at an Online Store

• Product Information• User Information• Purchase Information• Product Reviews• Site Interactions• Social Graph• Search Index

Page 14: Mongo db

@akshaymathu, @_sarangs

SQL to NoSQL

Design Paradigm Shift

Page 15: Mongo db

@akshaymathu, @_sarangs 15

SQL Storage

• Was designed when– Storage and data transfer was costly– Processing was slow– Applications were oriented more

towards data collection

• Initial adopters were financial institutions

Page 16: Mongo db

@akshaymathu, @_sarangs 16

SQL Storage

• Structured– schema

• Relational– foreign keys, constraints

• Transactional– Atomicity, Consistency, Isolation, Durability

• High Availability through robustness– Minimize failures

• Optimized for Writes• Typically Scale Up

Page 17: Mongo db

@akshaymathu, @_sarangs 17

NoSQL Storage

• Is designed when– Storage is cheap– Data transfer is fast–Much more processing power is

available• Clustering of machines is also possible

– Applications are oriented towards consumption of User Generated Content

– Better on-screen user experience is in demand

Page 18: Mongo db

@akshaymathu, @_sarangs 18

NoSQL Storage

• Semi-structured– Schemaless

• Consistency, Availability, Partition Tolerance

• High Availability through clustering– expect failures

• Optimized for Reads• Typically Scale Out

Page 19: Mongo db

@akshaymathu, @_sarangs

Different Datastores

Half Level Deep

Page 20: Mongo db

@akshaymathu, @_sarangs 20

SQL: RDBMS

• MySql, Postgresql, Oracle etc.• Stores data in tables having columns– Basic (number, text) data types

• Strong query language• Transparent values– Query language can read and filter on

them– Relationship between tables based on

values

• Suited for user info and transactions

Page 21: Mongo db

@akshaymathu, @_sarangs 21

NoSQL: Key/Value

• Redis, DynamoDB etc.• Stores a values against a key– Strings

• Values are opaque– Can not be part of query

• Suited for site interactions

Page 22: Mongo db

NoSQL: Key/Value

Page 23: Mongo db

@akshaymathu, @_sarangs 23

NoSQL: Document

• MongoDB, CouchDB etc.• Object Oriented data models– Stores data in document objects having

fields– Basic and compound (list, dict) data types

• SQL like queries• Transparent values– Can be part of query

• Suited for product info and its reviews

Page 24: Mongo db

NoSQL: Document

Page 25: Mongo db

@akshaymathu, @_sarangs 25

NoSQL: Column Family

• Cassandra, Big Table etc.• Stores data in columns• Transparent values– Can be part of query

• SQL like queries• Suited for search

Page 26: Mongo db

NoSQL: Column Family

Page 27: Mongo db

@akshaymathu, @_sarangs 27

NoSQL: Graph

• Neo4j• Stores data in form of nodes and

relationships• Query is in form of traversal• In-memory• Suited for social graph

Page 28: Mongo db

NoSQL: Graph

Page 29: Mongo db
Page 30: Mongo db

@akshaymathu, @_sarangs

Document Storage: Closer Look

Page 31: Mongo db

@akshaymathu, @_sarangs 31

MongoDB

• Document database• Powerful query language• Docs, sub-docs, indexes• Map/reduce• Replicas, shards, replicated shards• SDKs/drivers for so many languages

– C, C++, C#, Python, Erlang, PHP, Java, Javascript, NodeJS, Perl, Ruby, Scala

Page 32: Mongo db

@akshaymathu, @_sarangs 32

RDBMS: DB Design

Page 33: Mongo db

@akshaymathu, @_sarangs 33

RDBMS: Query

Page 34: Mongo db

@akshaymathu, @_sarangs 34

RDBMS MongoDB

RDBMS MongoDB

Database Database

Table Collection

Row Document

Column Field

Select c1, c2 from Table where c1 = ‘v1’ order by c2 limit n

Collection.objects(F1 = ‘v1’).order_by(‘c2’).limit(n)

Page 35: Mongo db

@akshaymathu, @_sarangs 35

MongoDB: Design

Page 36: Mongo db

@akshaymathu, @_sarangs 36

MongoDB: Query

• Movies.objects()

Page 37: Mongo db

@akshaymathu, @_sarangs 37

Page 38: Mongo db

Have you Installed?

http://www.mongodb.org/downloads

@akshaymathu, @_sarangs

Page 39: Mongo db

@akshaymathu, @_sarangs

Hands-on

Dive-in with Sarang

Page 40: Mongo db

@akshaymathu, @_sarangs 40

MongoDB: Core Binaries

• mongod– Database server

• mongo– Database client shell

• mongos– Router for Sharding

Page 41: Mongo db

@akshaymathu, @_sarangs 41

Getting Help

• For mongo shell–mongo –help• Shows options available for running the shell

• Inside mongo shell– Object.help()• Shows commands available on the object

Page 42: Mongo db

@akshaymathu, @_sarangs 42

Import Export Tools

• For objects–mongodump–mongorestore– bsondump–mongooplog

• For data items–mongoimport–mongoexport

Page 43: Mongo db

@akshaymathu, @_sarangs 43

Database Operations

• Database creation• Creating/changing collection• Data insertion• Data read• Data update• Creating indices• Data deletion• Dropping collection

Page 44: Mongo db

@akshaymathu, @_sarangs 44

Diagnostic Tools

• mongostat• mongoperf• mongosnif• mongotop

Page 45: Mongo db

@akshaymathu, @_sarangs 45

Page 46: Mongo db

@akshaymathu, @_sarangs 46

Assignment

• Go to http://www.velocitainc.com/mongo/– Tasks• assignments.txt

– Data• students.json

Page 47: Mongo db

@akshaymathu, @_sarangs

Disaster Recovery

Introduction to Replica Sets and

High Availability

Page 48: Mongo db

@akshaymathu, @_sarangs 48

Disasters

• Physical Failure– Hardware– Network

• Solution– Replica Sets• Provide redundant storage for High

Availability– Real time data synchronization

• Automatic failover for zero down time

Page 49: Mongo db

@akshaymathu, @_sarangs 49

Replication

Page 50: Mongo db

@akshaymathu, @_sarangs 50

Multi Replication

• Data can be replicated to multiple places simultaneously

• Odd number of machines are always needed in a replica set

Page 51: Mongo db

@akshaymathu, @_sarangs 51

Single Replication

• If you want to have only one or odd number of secondary, you need to setup an arbiter

Page 52: Mongo db

@akshaymathu, @_sarangs 52

Failover

• When primary fails, remaining machines vote for electing new primary

Page 53: Mongo db

@akshaymathu, @_sarangs

Handling Big Data

Introduction to Map/Reduce and Sharding

Page 54: Mongo db

@akshaymathu, @_sarangs 54

Large Data Sets

• Problem 1– Performance• Queries go slow

• Solution–Map/Reduce

Page 55: Mongo db

@akshaymathu, @_sarangs 55

Map Reduce

• A way to divide large query computation into smaller chunks

• May run in multiple processes across multiple machines

• Think of it as GROUP BY of SQL

Page 56: Mongo db

@akshaymathu, @_sarangs 56

Map/Reduce Example

• Map function digs the data and returns required values

Page 57: Mongo db

@akshaymathu, @_sarangs 57

Map/Reduce Example

• Reduce function uses the output of Map function and generates aggregated value

Page 58: Mongo db

@akshaymathu, @_sarangs 58

Large Data Sets

• Problem 2– Vertical Scaling of Hardware• Can’t increase machine size beyond a limit

• Solution– Sharding

Page 59: Mongo db

@akshaymathu, @_sarangs 59

Sharding

• A method for storing data across multiple machines

• Data is partitioned using Shard Keys

Page 60: Mongo db

@akshaymathu, @_sarangs 60

Data Partitioning: Range Based

• A range of Shard Keys stay in a chunk

Page 61: Mongo db

@akshaymathu, @_sarangs 61

Data Partitioning: Hash Bsed

• A hash function on Shard Keys decides the chunk

Page 62: Mongo db

@akshaymathu, @_sarangs 62

Sharded Cluster

Page 63: Mongo db

@akshaymathu, @_sarangs 63

Optimizing Shards: Splitting

• In a shard, when size of a chunk increases, the chunk is divided into two

Page 64: Mongo db

@akshaymathu, @_sarangs 64

Optimizing Shards: Balancing

• When number of chunks in a shard increase, a few chunks are migrated to other shard

Page 65: Mongo db

@akshaymathu, @_sarangs 65

Summary

• MongoDB is good– Stores objects as we use in programming

language– Flexible semi-structured design– Scales out to store big data– Embedded documents eliminates need for join

• MongoDB is bad– No multi-document query– De-normalized storage– No support for transactions

Page 66: Mongo db

@akshaymathu, @_sarangs 66

Thanks

@akshaymathu @_sarangs


Recommended