Date post: | 16-Jul-2015 |
Category: |
Data & Analytics |
Upload: | karthik-mohan |
View: | 27 times |
Download: | 0 times |
Contents
O Introduction
O Features
O RDBMS
O Data Models.
O Query Possibilities.
O Concurrency control.
O Partioning
O Replication and consistency.
Introduction
O SQL= Traditional Relational Database.
O NoSQL = No traditional Relational
Database.
O No SQL != Do not use Structured Query
Language.
O NoSQL = Not only SQL.
O Not every data management/ analysis
problem is solved using traditional
RDBMS.
O BIG Data.
Relational DBO Adding or removing a feature to a blog is
not possible without system unavailability.
O Due to their normalized data model and
ACID support RDBMS is not useful in
Web2.0 domains, because joins and locks
influence performance in distributed
systems negatively.
O These databases are typically based on
consistency instead of availability.
O Replication techniques are limited .
Key-Value Stores
O Data is addressed by a unique key.
O Values are isolated from each other.
O Schema Free – New values can be added
O Data/values are opaque to the system.
O Example :
Voldemart, Redis, Membase.
O Pros :
Simple Data Model.
Scalable.
O Cons:
Create your own “Foreign Keys”.
Poor for complex data.
Column FamilyO Based on BIGTABLE : Google’s
distributed storage system for structured data.
O Arbitrary no. of key value pairs can be stored within rows.
O Relationship to be implemented by App Logic.
O Columns can be grouped to form column families
O Examples :
HBase, Cassandra, Hyper table
O Hbase & Hypertext – Open source
implementations.
O Cassandra – Additional super column.
O Multiple versions of the same data are
stored in chronological order.
Document Databases
O Data Model :
collection of documents.
document is a key value collection.
within a document keys should be
unique.
JSON: Java Script Object Notation
O Example :
CouchDB, MongoDB.
O Pros:
Simple, Powerful Data Model.
Scalable.
O Cons:
Poor for interconnected Data.
Query model limits to keys and
values.
Graph Databases
O Data Model :
Nodes and Relationships.
O Examples :
neo4j, Orient DB.
O FlockDB –Twitter-One way relationship.
O Location Based system, Navigation
systems which uses complex relations.
Query Possibilities
For Key Value pairs:
O Key based put, get and delete operations.
O Membase offers REST API.
For Document Databases
O Document stores offer much richer APIs.
O Operations like and,or and between can be used
O MongoDB supports additional operations like count and distinct.
O Riak offers functionalities to traverse links between documents easily.
O UnQL Project
For Column Family
O Provide range queries and some
operations like "in", "and/or" and regular
expression.
O Even if every column family store offers a
SQL like query language in order to
provide a more convenient user
interaction, only row keys and indexed
values can be considered in where-
clauses, as well.
For graph database
O SPARQL is a popular, declarative query
language with a very simple syntax
providing graph pattern matching.
O Gremlin is an imperative programming
language used to perform graph
traversals based on XPA TH.
Concurrency Control
O Traditional databases use pessimistic consistency strategies with exclusive access on a dataset.
O Multiversion concurrency control (MVCC) relaxes strict consistency in favor of performance.
O In order to cope with two or more conflicting write operations, every process stores, additional to the new value, a link to the version the process read before.
Partitioning
O The first strategy distributes datasets by the range of their keys.
O In order to find a certain key, clients have to contact the routing server for getting the partition table.
O The second one is by Consistent Hashing.
O Neighbored keys are distributed randomly across the cluster.
O Graph algorithms can help identifying hotspots of strongly connected nodes in the graph schema.
Replication and Consistency
O BASE systems.
O Availability at the cost of consistency.
O Can be inconsistent, but the system must be high available and high performant at all time.
O NoSQL systems are not only full ACID or full BASE systems.
O Big Data is the only store, which supports full consistency and replication natively.
Conclusion
O Choose the proper data model.
O Queries.
O Key value should be used for simple and
fast operations.
O Column DB for large data.
O Graph DB for entities and relationship.
O Document DB offers flexible Data model.