+ All Categories
Home > Data & Analytics > NoSql evaluation

NoSql evaluation

Date post: 16-Jul-2015
Category:
Upload: karthik-mohan
View: 27 times
Download: 0 times
Share this document with a friend
23
NoSQL Evaluation By: Karthik Kamath G [email protected] Bens John [email protected]
Transcript

NoSQL EvaluationBy:

Karthik Kamath G

[email protected]

Bens John

[email protected]

Contents

O Introduction

O Features

O RDBMS

O Data Models.

O Query Possibilities.

O Concurrency control.

O Partioning

O Replication and consistency.

Introduction

O SQL= Traditional Relational Database.

O NoSQL = No traditional Relational

Database.

O No SQL != Do not use Structured Query

Language.

O NoSQL = Not only SQL.

O Not every data management/ analysis

problem is solved using traditional

RDBMS.

O BIG Data.

Features

O Convenient

O Multi-User

O Safe

O Persistent

O Reliable

O Massive

O Efficient

Relational DBO Adding or removing a feature to a blog is

not possible without system unavailability.

O Due to their normalized data model and

ACID support RDBMS is not useful in

Web2.0 domains, because joins and locks

influence performance in distributed

systems negatively.

O These databases are typically based on

consistency instead of availability.

O Replication techniques are limited .

NoSQL providers

Key-Value Stores

O Data is addressed by a unique key.

O Values are isolated from each other.

O Schema Free – New values can be added

O Data/values are opaque to the system.

O Example :

Voldemart, Redis, Membase.

O Pros :

Simple Data Model.

Scalable.

O Cons:

Create your own “Foreign Keys”.

Poor for complex data.

Column FamilyO Based on BIGTABLE : Google’s

distributed storage system for structured data.

O Arbitrary no. of key value pairs can be stored within rows.

O Relationship to be implemented by App Logic.

O Columns can be grouped to form column families

O Examples :

HBase, Cassandra, Hyper table

O Hbase & Hypertext – Open source

implementations.

O Cassandra – Additional super column.

O Multiple versions of the same data are

stored in chronological order.

O Pros :

Scalable.

O Cons :

Poor for interconnected data.

Document Databases

O Data Model :

collection of documents.

document is a key value collection.

within a document keys should be

unique.

JSON: Java Script Object Notation

O Example :

CouchDB, MongoDB.

O Pros:

Simple, Powerful Data Model.

Scalable.

O Cons:

Poor for interconnected Data.

Query model limits to keys and

values.

Graph Databases

O Data Model :

Nodes and Relationships.

O Examples :

neo4j, Orient DB.

O FlockDB –Twitter-One way relationship.

O Location Based system, Navigation

systems which uses complex relations.

O Pros :

Powerful Data Model.

Easy to query.

Friend of a friend Problem is solved.

Query Possibilities

For Key Value pairs:

O Key based put, get and delete operations.

O Membase offers REST API.

For Document Databases

O Document stores offer much richer APIs.

O Operations like and,or and between can be used

O MongoDB supports additional operations like count and distinct.

O Riak offers functionalities to traverse links between documents easily.

O UnQL Project

For Column Family

O Provide range queries and some

operations like "in", "and/or" and regular

expression.

O Even if every column family store offers a

SQL like query language in order to

provide a more convenient user

interaction, only row keys and indexed

values can be considered in where-

clauses, as well.

For graph database

O SPARQL is a popular, declarative query

language with a very simple syntax

providing graph pattern matching.

O Gremlin is an imperative programming

language used to perform graph

traversals based on XPA TH.

Concurrency Control

O Traditional databases use pessimistic consistency strategies with exclusive access on a dataset.

O Multiversion concurrency control (MVCC) relaxes strict consistency in favor of performance.

O In order to cope with two or more conflicting write operations, every process stores, additional to the new value, a link to the version the process read before.

Partitioning

O The first strategy distributes datasets by the range of their keys.

O In order to find a certain key, clients have to contact the routing server for getting the partition table.

O The second one is by Consistent Hashing.

O Neighbored keys are distributed randomly across the cluster.

O Graph algorithms can help identifying hotspots of strongly connected nodes in the graph schema.

Replication and Consistency

O BASE systems.

O Availability at the cost of consistency.

O Can be inconsistent, but the system must be high available and high performant at all time.

O NoSQL systems are not only full ACID or full BASE systems.

O Big Data is the only store, which supports full consistency and replication natively.

Conclusion

O Choose the proper data model.

O Queries.

O Key value should be used for simple and

fast operations.

O Column DB for large data.

O Graph DB for entities and relationship.

O Document DB offers flexible Data model.

Thank

You.


Recommended