NoSQL Riak MongoDB Elasticsearch - All The Same?

Post on 15-Jul-2015

1,404 views 4 download

transcript

MongoDB, Elasticsearch, Riak – all the same?

Eberhard Wolff Freelancer

Head Technology Advisory Board adesso AG

http://ewolff.com

Eberhard Wolff - @ewolff

Leseprobe: http://bit.ly/CD-Buch

Eberhard Wolff - @ewolff

Modeling: Relational

Databases vs. JSON

Eberhard Wolff - @ewolff

Financial System •  Different financial products

•  Mapping objects / database

•  Inheritance

Eberhard Wolff - @ewolff

E/R Model

Asset

Stock Zero Bond Option

Country > 20 database tables Up to 25 attributes

Currency

Eberhard Wolff - @ewolff

JOINs L

Get all asset with interest

rate x

Eberhard Wolff - @ewolff

Eberhard Wolff - @ewolff

JSON

Eberhard Wolff - @ewolff

Asset Type ID

Zero Bond

Interest Rate

Fixed Rate Bond

Interest Rate

Stock Option

Preferred Underlying asset

Country Price Country

Currency

Eberhard Wolff - @ewolff

{ "ID" : "42", "type" : "Fixed Rate Bond", "Country" : "DE", "Currency" : "EUR", "ISIN" : "DE0001141562", "Interest Rate" : "2.5" }

Eberhard Wolff - @ewolff

All stores in this presentation

support JSON

Eberhard Wolff - @ewolff

Scaling Relational Databases

Eberhard Wolff - @ewolff

Larger Server

DB Server DB Server Expensive

Server Limited

Eberhard Wolff - @ewolff

Common Storage

DB Server Expensive

Storage

Limited

DB Server

DB Server DB Server e.g. Oracle

RAC

Eberhard Wolff - @ewolff

Replication Cheap Server

Almost unlimited DB Server

DB Server DB Server DB Server

Inconsistent data

Conflict resolution or Read only

Eberhard Wolff - @ewolff

Replication

DB Server

DB Server DB Server DB Server

MySQL Master-Slave

Oracle Advanced

Replication

Eberhard Wolff - @ewolff

Network Failure •  Either Answer & provide outdated data •  or Don’t answer i.e. always provide up to date data

Eberhard Wolff - @ewolff

CAP •  Consistency •  Availability •  Network Partition Tolerance

•  If network fails provide a potentially incorrect answer or no at all?

Eberhard Wolff - @ewolff

BASE •  Basically Available •  Soft State •  Eventually (= in the end) consistent

•  i.e. give potentially incorrect answer

Eberhard Wolff - @ewolff

BASE and Relational DBs •  Very limited

•  Stand by •  Read only replica

•  No truly distributed DB

Eberhard Wolff - @ewolff

Relational & BASE •  Most relational operations cover

multiple tables

•  Needs locks across multiple servers

•  Not realistically possible

Eberhard Wolff - @ewolff

NoSQL & BASE •  Typical operation covers one data

structure •  …that contains more information

•  No complex locking

•  More sophisticated BASE

Eberhard Wolff - @ewolff

Naïve View on NoSQL

Eberhard Wolff - @ewolff

Key / Value Stores •  Map Key to Value •  For simple data structure •  Retrieval only by key

•  Easy scalability •  Only for simple

applications

Key Value 42 Some

data

Eberhard Wolff - @ewolff

Document Oriented •  Documents

e.g. JSON •  Complex

structures & queries

•  Still great scalability •  For more complex

applications

{ "author":{ "name":"Eberhard Wolff", "email":"eberhard.wolff@gmail.com" }, "title": "Continuous Delivery”, }

Eberhard Wolff - @ewolff

Graph, Column

Oriented…

Eberhard Wolff - @ewolff

Educated View on NoSQL

Eberhard Wolff - @ewolff

Key / value Document-based

Search engine All the same?

Eberhard Wolff - @ewolff

MongoDB

elasticsearch

Riak

Eberhard Wolff - @ewolff

MongoDB

elasticsearch

Riak

Eberhard Wolff - @ewolff

•  Key / value •  Truly distributed database

What is Riak?

Eberhard Wolff - @ewolff

Riak: Technologies •  Erlang

•  Open Source (Apache 2.0)

•  Company: Basho

Eberhard Wolff - @ewolff

•  Allows secondary indices

•  Riak Search 2.0: Solr integration •  Solr: Lucene based search engine •  API compatible to Solr

•  Key / value or document based?

More indices

Eberhard Wolff - @ewolff

•  Map/reduce •  Scans all datasets

•  Can store large binary objects

More Features

Eberhard Wolff - @ewolff

Scaling Riak •  Based on the Dynamo paper

•  Well understood •  …and battle proofed at Amazon

Eberhard Wolff - @ewolff

Scaling Riak Server A

Shard1 Shard3

Shard4

Server B Shard2 Shard1

Shard4

Server D Shard4 Shard2

Shard3

Server C Shard3 Shard2

Shard1

Eberhard Wolff - @ewolff

Scaling Riak Server A

Shard1 Shard3

Shard4

Server B Shard2 Shard1

Shard4

Server D Shard4 Shard2

Shard3

Server C Shard3 Shard2

Shard1

Eberhard Wolff - @ewolff

Scaling Riak Server A

Shard1 Shard3

Shard4

Server B Shard2 Shard1

Shard4

Server D Shard4 Shard2

Shard3

Server C Shard3 Shard2

Shard1

New Server

Eberhard Wolff - @ewolff

Tuning BASE •  N node with replica •  R nodes read from •  W nodes written to •  Trade off

Eberhard Wolff - @ewolff

Is it bullet proof?

Eberhard Wolff - @ewolff

Jepsen •  Test suite for network failures etc •  https://aphyr.com/tags/jepsen

•  Riak succeeds •  …if tuned correctly •  …might still need to merge versions •  https://aphyr.com/posts/285-call-me-

maybe-riak

Eberhard Wolff - @ewolff

MongoDB

elasticsearch

Riak

Eberhard Wolff - @ewolff

MongoDB

elasticsearch

Riak

Eberhard Wolff - @ewolff

MongoDB

elasticsearch

Riak

Eberhard Wolff - @ewolff

•  Document-oriented •  MMAPv1

Memory-mapped files + journal •  New in 3.0: WiredTiger for complex

loads Humongous

What is MongoDB?

Eberhard Wolff - @ewolff

MongoDB: Technologies •  C++

•  Open Source (AGPL)

•  Company: MongoDB, Inc.

Eberhard Wolff - @ewolff

•  Can store large binary objects

•  Its own full text search

More Features

Eberhard Wolff - @ewolff

More Features •  Map / Reduce

•  JavaScript

•  Aggregation framework

Eberhard Wolff - @ewolff

Scaling MongoDB

Replica 1

Shard 1

Replica 2

Replica 3

Shard 2

Replica 1

Replica 2

Replica 3

Eberhard Wolff - @ewolff

Availability

Replica 1

Shard 1

Replica 2

Replica 3

Shard 2

Replica 1

Replica 2

Replica 3

Eberhard Wolff - @ewolff

Scaling MongoDB

Replica 1

Shard 1

Replica 2

Replica 3

Replica 1

Shard 2

Replica 2

Replica 3

Replica 1

Shard 3

Replica 2

Replica 3

Eberhard Wolff - @ewolff

Scaling MongoDB

Replica 1

Shard 1

Replica 2

Replica 3

Shard 2

Replica 1

Replica 2

Replica 3

?

Eberhard Wolff - @ewolff

Tuning BASE •  Write concerns •  How many nodes should

acknowledge the write?

•  Read from primary •  …or also secondaries

Eberhard Wolff - @ewolff

Jepsen •  Mongo loses writes •  A bug – might still be there •  Also: non-acknowledge writes might

still survive •  …and overwrite other data

•  https://aphyr.com/posts/284-call-me-maybe-mongodb

Eberhard Wolff - @ewolff

MongoDB

elasticsearch

Riak

Eberhard Wolff - @ewolff

MongoDB

elasticsearch

Riak

Eberhard Wolff - @ewolff

MongoDB

elasticsearch

Riak

Eberhard Wolff - @ewolff

Database =Storage + Search

Eberhard Wolff - @ewolff

elasticsearch =Storage + Search

Eberhard Wolff - @ewolff

What is elasticsearch? •  Search Engine

•  Also stores original documents

•  Based on Lucene Search Libray

•  Easy scaling

Eberhard Wolff - @ewolff

elasticsearch: Technologies •  Java

•  REST

•  Open Source (Apache)

•  Backed by company elasticsearch

Eberhard Wolff - @ewolff

elasticsearch Internals •  Append only file •  Many benefits

•  But not too great for updates

Eberhard Wolff - @ewolff

Scaling elasticsearch

Server Server Server

Shard 1 Replica 1

Replica 2 Shard 2

Replica 3 Shard 3

Eberhard Wolff - @ewolff

Tuning BASE •  Write acknowledge: 1, majority, all •  Including indexing

•  Read from primary •  …or also secondaries

Eberhard Wolff - @ewolff

Jepsen •  Loses data even if just one node is

partioned (June 2014) •  Actively worked on •  It’s a search engine…

•  https://aphyr.com/posts/317-call-me-maybe-elasticsearch

•  http://www.elasticsearch.org/guide/en/elasticsearch/resiliency/current/

Eberhard Wolff - @ewolff

Scenarios

elasticsearch

Eberhard Wolff - @ewolff

Search •  Powerful query language

•  Configurable index

•  Text analysis •  Stop words •  Stemming

Eberhard Wolff - @ewolff

Facets •  Number of hits by category

•  Useful for statistics •  & Big Data

•  Statistical facet (+ computation)

•  Range facets etc.

Eberhard Wolff - @ewolff

MongoDB

elasticsearch

Riak

Eberhard Wolff - @ewolff

MongoDB

elasticsearch

Riak

Eberhard Wolff - @ewolff

Conclusion •  Relational databases might be

BASE •  NoSQL embraces BASE better •  Key / Value, Document stores and

search engine: very similar features •  Care about scaling •  Care about resilience

Eberhard Wolff - @ewolff

Eberhard Wolff - @ewolff

Thank You!