Introduction to no sql database

Post on 27-Jan-2015

133 views 1 download

Tags:

description

Introduction to NoSQL database and polyglot persistence

transcript

NoSQLIt’s about making intelligent

choices

The Relation Model

• Simplicity and Elegance• Well Understood• Very Powerful Abstraction• Solve Many Storage Problem (Persistent Data)• Concurrency• Integration• A Mostly Standard Model• …• But It also has its Limitation…

Business Database

Issues With Implementing A Relational Database• Agility and Programmability (Impedance Mismatch)• Flexibility• Performance and Scalability• Availability

NoSQL Business Drivers

NoSQLNo SQL

Not Only SQLNon-relational Database

Key/Value Store

Typical Usage

• Image Stores• Key-Based File Systems• Object Cache• Systems Designed to Scale

Key/Value Store

• BerkeleyDB• LevelDB• Memcached• Project Voldemort• Redis• Riak

Document Database

Typical Usage

• Web Crawler Results• Big Data Problems That Can Relax Consistency Rules

Document Database

• CouchDB• MongoDB• OrientDB• RavenDB• Terrastore

Column Family

Typical Usage

• High-Variability Data• Document Search• Integration Hubs• Web Content Management• Publishing

Column Family

• Amazon SimpleDB• Cassandra• Hbase• HyperTable

Graph Database

Typical Usage

• Social Networks• Fraud Detection• Relationship-Heavy Data

Graph Database

• FlockDB• HyperGraphDB• InfiniteGraph• Neo4J• OrientDB

Common Features of NoSQL Databases

• Designing Aggregations• An aggregate in a NoSQL database is similar to a row in a

table in a relational database

• Materializing Summary Data• Map/Reduce

• Implementing High Availability • Clusters

• Improving Scalability and Reducing Network Latency• Sharding

• Improving Consistency• Data Versioning

• Schemas and Non-Uniformity

NoSQLCase Studies

LiveJournal’s Memcache

LiveJournal’s Memcache

• Driver• Need to increase performance of database queries.

• Finding• By using hashing and caching, data in RAM can be

shared. This cuts down the number of read requests sent to the database, increasing performance.

Google’s MapReduce

MapReduce Example – Word Count

Google’s MapReduce

• Driver• Need to index billions of web pages for search using

low-cost hardware.

• Finding• By using parallel processing, indexing billions of web

pages can be done quickly with a large number of commodity processors.

Google BigTable

• Driver• Need to flexibly store tabular data in a distributed

system.

• Finding• By using a sparse matrix approach, users can think of

all data as being stored in a single table with billions of rows and millions of columns without the need for up-front data modeling.

Amazon’s Dynamo

Amazon’s Dynamo

• Driver• Need to accept a web order 24 hours a day, 7 days a

week.

•Finding• A key-value store with a simple interface can be

replicated even when there are large volumes of data to be processed.

Polyglot Persistence

Key Points

• Relational databases have been a successful technology for twenty years, providing persistence, concurrency control, and an integration mechanism.

• Application developers have been frustrated with the impedance mismatch between the relational model and the in-memory data structures.

• There is a movement away from using databases as integration points towards encapsulating databases within applications and integrating through services.

• The most important result of the rise of NoSQL is Polyglot Persistence.

Key Points

• The vital factor for a change in data storage was the need to support large volumes of data by running on clusters. Relational databases are not designed to run efficiently on clusters.

• NoSQL is an accidental neologism. There is no prescriptive definition—all you can make is an observation of common characteristics.

• The common characteristics of NoSQL databases are• Not using the relational model• Running well on clusters• Open Source• Built for the 21st century web estates• Schemaless