Date post: | 27-Jan-2015 |
Category: |
Technology |
Upload: | akshay-mathur |
View: | 2,295 times |
Download: | 2 times |
NoSQL Database
Akshay MathurSarang Shravagi
@akshaymathu, @_sarangs
{name: ‘mongo’, type: ‘db’}
@akshaymathu, @_sarangs 2
Who uses MongoDB
@akshaymathu, @_sarangs 3
Let’s Know Each Other
• Do you code?• OS?• Programing Language?• Why are you attending?
@akshaymathu, @_sarangs 4
Akshay Mathur
• Managed development, testing and release teams in last 14+ years– Currently Principal Architect at
ShopSocially
• Founding Team Member of– ShopSocially (Enabling “social” for
retailers)– AirTight Neworks (Global leader of WIPS)
@akshaymathu, @_sarangs 5
Sarang Shravagi
• 10gen Certified Developer and DBA• CS graduate from PICT Pune• 3+ years in Software Product
industry• Currently Senior Full-stack Developer
at ShopSocially
@akshaymathu, @_sarangs 6
How we use MongoDB
Python MongoDB
MongoEngine
@akshaymathu, @_sarangs 7
Where MongoDB Fits
@akshaymathu, @_sarangs 8
Program Outline: Understanding NoSQL
• Data Landscape• Different Storage Needs• Design Paradigm Shift from SQL to
NoSQL• Different Datastores• Closer look to Document Storage• Drawing parallel from RDBMS
@akshaymathu, @_sarangs 9
Program Outline: Hands on Lab
• Installation and basic configuration• Mongo Shell• Creating and Changing Schema• Create, Read, Update and Delete of Data• Analyzing Performance• Improving performance by creating
Indices• Assignment• Problem solving for the assignment
@akshaymathu, @_sarangs 10
Program Outline: Advance Topics
• Handling Big Data– Introduction to Map/Reduce– Introduction to Data Partitioning
(Sharding)
• Disaster Recovery– Introduction to Replica set and High
Availability
@akshaymathu, @_sarangs 11
Ground Rules
• Disturb Everyone– Not by phone rings– Not by local talks– By more information
and questions
@akshaymathu, @_sarangs
Data Patterns & Storage Needs
@akshaymathu, @_sarangs 13
Data at an Online Store
• Product Information• User Information• Purchase Information• Product Reviews• Site Interactions• Social Graph• Search Index
@akshaymathu, @_sarangs
SQL to NoSQL
Design Paradigm Shift
@akshaymathu, @_sarangs 15
SQL Storage
• Was designed when– Storage and data transfer was costly– Processing was slow– Applications were oriented more
towards data collection
• Initial adopters were financial institutions
@akshaymathu, @_sarangs 16
SQL Storage
• Structured– schema
• Relational– foreign keys, constraints
• Transactional– Atomicity, Consistency, Isolation, Durability
• High Availability through robustness– Minimize failures
• Optimized for Writes• Typically Scale Up
@akshaymathu, @_sarangs 17
NoSQL Storage
• Is designed when– Storage is cheap– Data transfer is fast–Much more processing power is
available• Clustering of machines is also possible
– Applications are oriented towards consumption of User Generated Content
– Better on-screen user experience is in demand
@akshaymathu, @_sarangs 18
NoSQL Storage
• Semi-structured– Schemaless
• Consistency, Availability, Partition Tolerance
• High Availability through clustering– expect failures
• Optimized for Reads• Typically Scale Out
@akshaymathu, @_sarangs
Different Datastores
Half Level Deep
@akshaymathu, @_sarangs 20
SQL: RDBMS
• MySql, Postgresql, Oracle etc.• Stores data in tables having columns– Basic (number, text) data types
• Strong query language• Transparent values– Query language can read and filter on
them– Relationship between tables based on
values
• Suited for user info and transactions
@akshaymathu, @_sarangs 21
NoSQL: Key/Value
• Redis, DynamoDB etc.• Stores a values against a key– Strings
• Values are opaque– Can not be part of query
• Suited for site interactions
NoSQL: Key/Value
@akshaymathu, @_sarangs 23
NoSQL: Document
• MongoDB, CouchDB etc.• Object Oriented data models– Stores data in document objects having
fields– Basic and compound (list, dict) data types
• SQL like queries• Transparent values– Can be part of query
• Suited for product info and its reviews
NoSQL: Document
@akshaymathu, @_sarangs 25
NoSQL: Column Family
• Cassandra, Big Table etc.• Stores data in columns• Transparent values– Can be part of query
• SQL like queries• Suited for search
NoSQL: Column Family
@akshaymathu, @_sarangs 27
NoSQL: Graph
• Neo4j• Stores data in form of nodes and
relationships• Query is in form of traversal• In-memory• Suited for social graph
NoSQL: Graph
@akshaymathu, @_sarangs
Document Storage: Closer Look
@akshaymathu, @_sarangs 31
MongoDB
• Document database• Powerful query language• Docs, sub-docs, indexes• Map/reduce• Replicas, shards, replicated shards• SDKs/drivers for so many languages
– C, C++, C#, Python, Erlang, PHP, Java, Javascript, NodeJS, Perl, Ruby, Scala
@akshaymathu, @_sarangs 32
RDBMS: DB Design
@akshaymathu, @_sarangs 33
RDBMS: Query
@akshaymathu, @_sarangs 34
RDBMS MongoDB
RDBMS MongoDB
Database Database
Table Collection
Row Document
Column Field
Select c1, c2 from Table where c1 = ‘v1’ order by c2 limit n
Collection.objects(F1 = ‘v1’).order_by(‘c2’).limit(n)
@akshaymathu, @_sarangs 35
MongoDB: Design
@akshaymathu, @_sarangs 36
MongoDB: Query
• Movies.objects()
@akshaymathu, @_sarangs 37
Have you Installed?
http://www.mongodb.org/downloads
@akshaymathu, @_sarangs
@akshaymathu, @_sarangs
Hands-on
Dive-in with Sarang
@akshaymathu, @_sarangs 40
MongoDB: Core Binaries
• mongod– Database server
• mongo– Database client shell
• mongos– Router for Sharding
@akshaymathu, @_sarangs 41
Getting Help
• For mongo shell–mongo –help• Shows options available for running the shell
• Inside mongo shell– Object.help()• Shows commands available on the object
@akshaymathu, @_sarangs 42
Import Export Tools
• For objects–mongodump–mongorestore– bsondump–mongooplog
• For data items–mongoimport–mongoexport
@akshaymathu, @_sarangs 43
Database Operations
• Database creation• Creating/changing collection• Data insertion• Data read• Data update• Creating indices• Data deletion• Dropping collection
@akshaymathu, @_sarangs 44
Diagnostic Tools
• mongostat• mongoperf• mongosnif• mongotop
@akshaymathu, @_sarangs 45
@akshaymathu, @_sarangs 46
Assignment
• Go to http://www.velocitainc.com/mongo/– Tasks• assignments.txt
– Data• students.json
@akshaymathu, @_sarangs
Disaster Recovery
Introduction to Replica Sets and
High Availability
@akshaymathu, @_sarangs 48
Disasters
• Physical Failure– Hardware– Network
• Solution– Replica Sets• Provide redundant storage for High
Availability– Real time data synchronization
• Automatic failover for zero down time
@akshaymathu, @_sarangs 49
Replication
@akshaymathu, @_sarangs 50
Multi Replication
• Data can be replicated to multiple places simultaneously
• Odd number of machines are always needed in a replica set
@akshaymathu, @_sarangs 51
Single Replication
• If you want to have only one or odd number of secondary, you need to setup an arbiter
@akshaymathu, @_sarangs 52
Failover
• When primary fails, remaining machines vote for electing new primary
@akshaymathu, @_sarangs
Handling Big Data
Introduction to Map/Reduce and Sharding
@akshaymathu, @_sarangs 54
Large Data Sets
• Problem 1– Performance• Queries go slow
• Solution–Map/Reduce
@akshaymathu, @_sarangs 55
Map Reduce
• A way to divide large query computation into smaller chunks
• May run in multiple processes across multiple machines
• Think of it as GROUP BY of SQL
@akshaymathu, @_sarangs 56
Map/Reduce Example
• Map function digs the data and returns required values
@akshaymathu, @_sarangs 57
Map/Reduce Example
• Reduce function uses the output of Map function and generates aggregated value
@akshaymathu, @_sarangs 58
Large Data Sets
• Problem 2– Vertical Scaling of Hardware• Can’t increase machine size beyond a limit
• Solution– Sharding
@akshaymathu, @_sarangs 59
Sharding
• A method for storing data across multiple machines
• Data is partitioned using Shard Keys
@akshaymathu, @_sarangs 60
Data Partitioning: Range Based
• A range of Shard Keys stay in a chunk
@akshaymathu, @_sarangs 61
Data Partitioning: Hash Bsed
• A hash function on Shard Keys decides the chunk
@akshaymathu, @_sarangs 62
Sharded Cluster
@akshaymathu, @_sarangs 63
Optimizing Shards: Splitting
• In a shard, when size of a chunk increases, the chunk is divided into two
@akshaymathu, @_sarangs 64
Optimizing Shards: Balancing
• When number of chunks in a shard increase, a few chunks are migrated to other shard
@akshaymathu, @_sarangs 65
Summary
• MongoDB is good– Stores objects as we use in programming
language– Flexible semi-structured design– Scales out to store big data– Embedded documents eliminates need for join
• MongoDB is bad– No multi-document query– De-normalized storage– No support for transactions
@akshaymathu, @_sarangs 66
Thanks
@akshaymathu @_sarangs