+ All Categories
Home > Software > Polyglot Persistence NoSQL 3-in-1 Database: Graph DB, Key/Value & Document Store

Polyglot Persistence NoSQL 3-in-1 Database: Graph DB, Key/Value & Document Store

Date post: 13-Jul-2015
Category:
Upload: max-neunhoeffer
View: 297 times
Download: 0 times
Share this document with a friend
Popular Tags:
22
Polyglot Persistence, NoSQL 3-in-1 Database: Graph DB, Key/Value & Document Store Max Neunhöffer Database Month New York, 11 November 2014
Transcript

Polyglot Persistence, NoSQL3-in-1 Database:Graph DB, Key/Value &Document StoreMax Neunhöffer

Database Month New York, 11 November 2014

www.arangodb.com

Max NeunhöfferI am amathematician

“Earlier life”: Research in Computer Algebra(Computational Group Theory)Always juggled with big dataNow: working in database development, NoSQL, ArangoDBI like:

research,hacking,teaching,tickling the highest performance out of computer systems.

1

ArangoDB GmbHtriAGENS GmbH offers consulting services since 2004:

software architectureproject managementsoftware developmentbusiness analysisa lot of experience with specialised database systems.have done NoSQL, before the term was coined at all2011/2012, an idea emerged:to build the database one had wished to have all those years!development of ArangoDB as open source software since 2012ArangoDB GmbH: spin-off to take care of ArangoDB (2014)

2

A typical Project: a Web ShopThe Specification Workshop

(need recommendation engine, need statistics, etc.)

The Developers get to work . . .(tables, relations, normalisation, schemas, queries, front-ends, etc.)

HANDOVER(Why can I not . . . ? This is unusable!)

3

Solution: Agile Approach and Domain Driven DesignThese days, many use (or try to use):

agile methods (Scrum, sprints, rapid prototyping)with continuous feedback from product owners to developerspromising less surprises in deployment and high flexibility.Domain Driven Design (Eric Evans, 2004):

identify a Domain (area in which software is applied)make a Model (abstract description of situation)use a Ubiquitous Language (that all team members speak)clearly define the Context in which the model applies.Model your data as close to the domain as possible.Example: object oriented programming

4

Fundamental Problem: need a ubiquitous LanguageListening to team members, you hear completely different things:Product Managers talk about

customers “browsing” through the shop,powerful search for products (with the “good ones” up),“useful” recommendations.

Developers talk abouttables, normalisation, queries and joinssecondary indexes, front-end pagesobject oriented, model view controller, responsive design

=⇒ both groups think the others are morons5

The problem is rooted very deeply

functionality not gatheredmethodically⇓“obvious” functions are missing

no common language⇓misunderstandings about details

6

NoSQL: Richer Data Models are closer to the DomainSome terms used by Evans as part of the ubiquitous language:

Entity: has an identity and mutable state (e.g. a person)Value object: is identified by its attributes and immutable(e.g. an address)Aggregate: is a combination of entities and value objects into onetransactional unit (e.g. a customer with its orders)Association: is a relation between entities and value objects, canhave attributes, usually immutableConsequencesThese terms coming from the Domain must be present inthe Design. The whole team must understand the samewhen talking about them.

7

Polyglot PersistenceIdeaUse the right data model for each part of a system.

For an application, persistan object or structured data as a JSON document,a hash table in a key/value store,relations between objects in a graph database,a homogeneous array in a relational DBMS.If the table has many empty cells or inhomogeneous rows, usea column-based database.

Take scalability needs into account!8

Document and Key/Value StoresDocument storeA document store stores a set of documents, which usuallymeans JSON data, these sets are called collections. Thedatabase has access to the contents of the documents.each document in the collection has a unique keysecondary indexes possible, leading to more powerful queriesdifferent documents in the same collection: structure can varyno schema is required for a collectiondatabase normalisation can be relaxedKey/value storeOpaque values, only key lookup without secondary indexes:

=⇒ high performance and perfect scalability9

Graph DatabasesGraph databaseA graph database stores a labelled graph. Vertices andedges are documents. Graphs are good to model relations.graphs often describe data very naturally (e.g. the facebookfriendship graph)graphs can be stored using tables, however, graph queriesnotoriously lead to expensive joinsthere are interesting and useful graph algorithms like “shortestpath” or “neighbourhood”need a good query language to reap the benefitshorizontal scalability is troublesomegraph databases vary widely in scope and usage, no standard

10

A typical Use Case— an Online ShopWe need to hold

customer data: usually homogeneous, but still variations=⇒ use a document store:product data: even for a specialised business quiteinhomogeneous=⇒ use a document store:shopping carts: need very fast lookup by session key=⇒ use a key/value store:order and sales data: relate customers and products=⇒ use a document store:recommendation engine data: links between different entities=⇒ use a graph database:

11

Polyglot Persistence is nice, but . . .Consequence: One needs multiple database systems in the persis-tence layer of a single project!Polyglot persistence introduces some friction through

data synchronisation,data conversion,increased installation and administration effort,more training needs.Wouldn’t it be nice, . . .. . . to enjoy the benefits without the disadvantages?

12

The Multi-Model ApproachMulti-model databaseA multi-model database combines a document store with agraph database and a key/value store.Vertices are documents in a vertex collection,edges are documents in an edge collection.a single, common query language for all three data modelsis able to compete with specialised products on their turfallows for polyglot persistence using a single databasequeries can mix the different data modelscan replace a RDMBS in many cases

13

A Map of the NoSQL Landscape

Map/reduce

Column Stores

Extensibility

Documents

Massively distributed

Graphs

Structured

Data

Key/Value

Operational DBs

Analytic DBs

Complex queries

14

is a multi-model database (document store & graph database),is open source and free (Apache 2 license),offers convenient queries (via HTTP/REST and AQL),including joins between different collections,strong consistency guarantees using transactionsis memory efficient by shape detection,uses JavaScript throughout (Google’s V8 built into server),API extensible by JavaScript code in the Foxx framework,offers many drivers for a wide range of languages,is easy to use with web front end and good documentation,and enjoys good community as well as professional support.

15

A Map of the NoSQL Landscape

Map/reduce

Column Stores

Extensibility

Documents

Massively distributed

Graphs

Structured

Data

Key/Value

Operational DBs

Analytic DBs

Complex queries

16

The ArangoDB Territory

Map/reduce

Column Stores

Extensibility

Documents

Massively distributed

Graphs

Structured

Data

Key/Value

Operational DBs

Analytic DBs

Complex queries

17

Strong ConsistencyArangoDB offers

atomic and isolated CRUD operations for single documents,transactions spanning multiple documents and multiplecollections,snapshot semantics for complex queries,very secure durable storage using append only and storingmultiple revisions,all this for documents as well as for graphs.

In the (near) future, ArangoDB willoffer the same ACID semantics even with sharding,implement complete MVCC semantics to allow for lock-freeconcurrent transactions.

18

Replication and Sharding— horizontal scalability

Right now, ArangoDB provideseasy setup of (asynchronous) replication,which allows read access parallelisation (master/slaves setup),sharding with automatic data distribution to multiple servers.

Very soon, ArangoDB will featurefault tolerance by automatic failover and synchronousreplication in cluster mode,zero administration by a self-reparing and self-balancingcluster architecture.

19

Powerful query language: AQLThe built in Arango Query Language AQL allows

complex, powerful and convenient queries,with transaction semantics,allowing to do joins,with user definable functions (in JavaScript).AQL is independent of the driver used andoffers protection against injections by design.

For Version 2.3, we are reengineering the AQL query engine:use a C++ implementation for high performance,optimise distributed queries in the cluster.

20

Extensible through JavaScript and FoxxThe HTTP API of ArangoDB

can be extended by user-defined JavaScript code,that is executed in the DB server for high performance.This is formalised by the Foxx framework,which allows to implement complex, user-defined APIs withdirect access to the DB engine.Very flexible and secure authentication schemes can beimplemented conveniently by the user in JavaScript.Because JavaScript runs everywhere (in the DB server as wellas in the browser), one can use the same libraries in theback-end and in the front-end.

=⇒ implement your own micro services21


Recommended