Mongo db

Post on 27-Jan-2015

2,295 views 2 download

Tags:

description

MongoDB is a popular NoSQL database. This presentation was delivered during a workshop. First it talks about NoSQL databases, shift in their design paradigm, focuses a little more on document based NoSQL databases and tries drawing some parallel from SQL databases. Second part, is for hands-on session of MongoDB using mongo shell. But the slides help very less. At last it touches advance topics like data replication for disaster recovery and handling big data using map-reduce as well as Sharding.

transcript

NoSQL Database

Akshay MathurSarang Shravagi

@akshaymathu, @_sarangs

{name: ‘mongo’, type: ‘db’}

@akshaymathu, @_sarangs 2

Who uses MongoDB

@akshaymathu, @_sarangs 3

Let’s Know Each Other

• Do you code?• OS?• Programing Language?• Why are you attending?

@akshaymathu, @_sarangs 4

Akshay Mathur

• Managed development, testing and release teams in last 14+ years– Currently Principal Architect at

ShopSocially

• Founding Team Member of– ShopSocially (Enabling “social” for

retailers)– AirTight Neworks (Global leader of WIPS)

@akshaymathu, @_sarangs 5

Sarang Shravagi

• 10gen Certified Developer and DBA• CS graduate from PICT Pune• 3+ years in Software Product

industry• Currently Senior Full-stack Developer

at ShopSocially

@akshaymathu, @_sarangs 6

How we use MongoDB

Python MongoDB

MongoEngine

@akshaymathu, @_sarangs 7

Where MongoDB Fits

@akshaymathu, @_sarangs 8

Program Outline: Understanding NoSQL

• Data Landscape• Different Storage Needs• Design Paradigm Shift from SQL to

NoSQL• Different Datastores• Closer look to Document Storage• Drawing parallel from RDBMS

@akshaymathu, @_sarangs 9

Program Outline: Hands on Lab

• Installation and basic configuration• Mongo Shell• Creating and Changing Schema• Create, Read, Update and Delete of Data• Analyzing Performance• Improving performance by creating

Indices• Assignment• Problem solving for the assignment

@akshaymathu, @_sarangs 10

Program Outline: Advance Topics

• Handling Big Data– Introduction to Map/Reduce– Introduction to Data Partitioning

(Sharding)

• Disaster Recovery– Introduction to Replica set and High

Availability

@akshaymathu, @_sarangs 11

Ground Rules

• Disturb Everyone– Not by phone rings– Not by local talks– By more information

and questions

@akshaymathu, @_sarangs

Data Patterns & Storage Needs

@akshaymathu, @_sarangs 13

Data at an Online Store

• Product Information• User Information• Purchase Information• Product Reviews• Site Interactions• Social Graph• Search Index

@akshaymathu, @_sarangs

SQL to NoSQL

Design Paradigm Shift

@akshaymathu, @_sarangs 15

SQL Storage

• Was designed when– Storage and data transfer was costly– Processing was slow– Applications were oriented more

towards data collection

• Initial adopters were financial institutions

@akshaymathu, @_sarangs 16

SQL Storage

• Structured– schema

• Relational– foreign keys, constraints

• Transactional– Atomicity, Consistency, Isolation, Durability

• High Availability through robustness– Minimize failures

• Optimized for Writes• Typically Scale Up

@akshaymathu, @_sarangs 17

NoSQL Storage

• Is designed when– Storage is cheap– Data transfer is fast–Much more processing power is

available• Clustering of machines is also possible

– Applications are oriented towards consumption of User Generated Content

– Better on-screen user experience is in demand

@akshaymathu, @_sarangs 18

NoSQL Storage

• Semi-structured– Schemaless

• Consistency, Availability, Partition Tolerance

• High Availability through clustering– expect failures

• Optimized for Reads• Typically Scale Out

@akshaymathu, @_sarangs

Different Datastores

Half Level Deep

@akshaymathu, @_sarangs 20

SQL: RDBMS

• MySql, Postgresql, Oracle etc.• Stores data in tables having columns– Basic (number, text) data types

• Strong query language• Transparent values– Query language can read and filter on

them– Relationship between tables based on

values

• Suited for user info and transactions

@akshaymathu, @_sarangs 21

NoSQL: Key/Value

• Redis, DynamoDB etc.• Stores a values against a key– Strings

• Values are opaque– Can not be part of query

• Suited for site interactions

NoSQL: Key/Value

@akshaymathu, @_sarangs 23

NoSQL: Document

• MongoDB, CouchDB etc.• Object Oriented data models– Stores data in document objects having

fields– Basic and compound (list, dict) data types

• SQL like queries• Transparent values– Can be part of query

• Suited for product info and its reviews

NoSQL: Document

@akshaymathu, @_sarangs 25

NoSQL: Column Family

• Cassandra, Big Table etc.• Stores data in columns• Transparent values– Can be part of query

• SQL like queries• Suited for search

NoSQL: Column Family

@akshaymathu, @_sarangs 27

NoSQL: Graph

• Neo4j• Stores data in form of nodes and

relationships• Query is in form of traversal• In-memory• Suited for social graph

NoSQL: Graph

@akshaymathu, @_sarangs

Document Storage: Closer Look

@akshaymathu, @_sarangs 31

MongoDB

• Document database• Powerful query language• Docs, sub-docs, indexes• Map/reduce• Replicas, shards, replicated shards• SDKs/drivers for so many languages

– C, C++, C#, Python, Erlang, PHP, Java, Javascript, NodeJS, Perl, Ruby, Scala

@akshaymathu, @_sarangs 32

RDBMS: DB Design

@akshaymathu, @_sarangs 33

RDBMS: Query

@akshaymathu, @_sarangs 34

RDBMS MongoDB

RDBMS MongoDB

Database Database

Table Collection

Row Document

Column Field

Select c1, c2 from Table where c1 = ‘v1’ order by c2 limit n

Collection.objects(F1 = ‘v1’).order_by(‘c2’).limit(n)

@akshaymathu, @_sarangs 35

MongoDB: Design

@akshaymathu, @_sarangs 36

MongoDB: Query

• Movies.objects()

@akshaymathu, @_sarangs 37

Have you Installed?

http://www.mongodb.org/downloads

@akshaymathu, @_sarangs

@akshaymathu, @_sarangs

Hands-on

Dive-in with Sarang

@akshaymathu, @_sarangs 40

MongoDB: Core Binaries

• mongod– Database server

• mongo– Database client shell

• mongos– Router for Sharding

@akshaymathu, @_sarangs 41

Getting Help

• For mongo shell–mongo –help• Shows options available for running the shell

• Inside mongo shell– Object.help()• Shows commands available on the object

@akshaymathu, @_sarangs 42

Import Export Tools

• For objects–mongodump–mongorestore– bsondump–mongooplog

• For data items–mongoimport–mongoexport

@akshaymathu, @_sarangs 43

Database Operations

• Database creation• Creating/changing collection• Data insertion• Data read• Data update• Creating indices• Data deletion• Dropping collection

@akshaymathu, @_sarangs 44

Diagnostic Tools

• mongostat• mongoperf• mongosnif• mongotop

@akshaymathu, @_sarangs 45

@akshaymathu, @_sarangs 46

Assignment

• Go to http://www.velocitainc.com/mongo/– Tasks• assignments.txt

– Data• students.json

@akshaymathu, @_sarangs

Disaster Recovery

Introduction to Replica Sets and

High Availability

@akshaymathu, @_sarangs 48

Disasters

• Physical Failure– Hardware– Network

• Solution– Replica Sets• Provide redundant storage for High

Availability– Real time data synchronization

• Automatic failover for zero down time

@akshaymathu, @_sarangs 49

Replication

@akshaymathu, @_sarangs 50

Multi Replication

• Data can be replicated to multiple places simultaneously

• Odd number of machines are always needed in a replica set

@akshaymathu, @_sarangs 51

Single Replication

• If you want to have only one or odd number of secondary, you need to setup an arbiter

@akshaymathu, @_sarangs 52

Failover

• When primary fails, remaining machines vote for electing new primary

@akshaymathu, @_sarangs

Handling Big Data

Introduction to Map/Reduce and Sharding

@akshaymathu, @_sarangs 54

Large Data Sets

• Problem 1– Performance• Queries go slow

• Solution–Map/Reduce

@akshaymathu, @_sarangs 55

Map Reduce

• A way to divide large query computation into smaller chunks

• May run in multiple processes across multiple machines

• Think of it as GROUP BY of SQL

@akshaymathu, @_sarangs 56

Map/Reduce Example

• Map function digs the data and returns required values

@akshaymathu, @_sarangs 57

Map/Reduce Example

• Reduce function uses the output of Map function and generates aggregated value

@akshaymathu, @_sarangs 58

Large Data Sets

• Problem 2– Vertical Scaling of Hardware• Can’t increase machine size beyond a limit

• Solution– Sharding

@akshaymathu, @_sarangs 59

Sharding

• A method for storing data across multiple machines

• Data is partitioned using Shard Keys

@akshaymathu, @_sarangs 60

Data Partitioning: Range Based

• A range of Shard Keys stay in a chunk

@akshaymathu, @_sarangs 61

Data Partitioning: Hash Bsed

• A hash function on Shard Keys decides the chunk

@akshaymathu, @_sarangs 62

Sharded Cluster

@akshaymathu, @_sarangs 63

Optimizing Shards: Splitting

• In a shard, when size of a chunk increases, the chunk is divided into two

@akshaymathu, @_sarangs 64

Optimizing Shards: Balancing

• When number of chunks in a shard increase, a few chunks are migrated to other shard

@akshaymathu, @_sarangs 65

Summary

• MongoDB is good– Stores objects as we use in programming

language– Flexible semi-structured design– Scales out to store big data– Embedded documents eliminates need for join

• MongoDB is bad– No multi-document query– De-normalized storage– No support for transactions

@akshaymathu, @_sarangs 66

Thanks

@akshaymathu @_sarangs