Evaluating Client-Side Replicated NoSQL Databases Approaches€¦ · Redis. 32 4.1.3 et.io Sok c....

Polite ni o di Milano

S uola di Ingegneria Industriale e dell'Informazione

Corso di Laurea Magistrale in Ingegneria Informati a

Dipartimento di Elettroni a, Informazione e Bioingegneria

Thesis

Evaluating Client-Side Repli ated NoSQL Databases

Approa hes

Relatore: Prof. Raaela MIRANDOLA

Correlatore: Ing. Mar o SCAVUZZO

Tesi di laurea di:

Claudio CARDINALE Matr. 849760

Anno A ademi o 20162017

To Noemi...

A knowledgments

My heartfelt thanks to Professor Raaela Mirandola who supported me in the

elaboration of this master thesis with her pre ious guidan e, advi e and suggestions.

Thanks to the o-advisor Mar o S avuzzo for his patien e and professionality, for

having helped me to orient myself on this broad and not very standardized topi and

for his prompt and e ient replies to my emails of request for help.

Many many thanks to my friends and fellow students for their support, the are-

freeness I ould experien e in these 5 wonderful years and their friendship that has

been very pre ious to me.

My deepest gratitude to my family for getting me to where I am today, for the

trust pla ed in me and the onstant en ouragement.

Last but not least, a spe ial thanks goes to Noemi that in di ult times has always

found the right words to motivate me, for having always believed in me, for her

patien e and onstant support during the ourse of my studies and in parti ular

during this last period and for having shared with me this signi ant experien e.

Abstra t

In today's appli ations, data are in reasing exponentially and also need to be

repli ated on dierent devi es in realtime; these dierent devi es should be able to

use them even if they are oine. So the devi es need a lo al opy of the database

available also oine alled lo al database.

The typi al devi e that uses this kind of appli ation is a smartphone. But not only,

also other web appli ations use them, su h as ollaborative softwares (like Google

Do s).

Dierent solutions based on NoSQL databases were proposed (both opensour e

and proprietary in loud lo ated), but of ourse ustom solutions based on RDBMS

are also feasible. The NoSQL based solutions are not standardized, so there is no

name even for them so we all them CS-NoSQL ( lient side NoSQL).

NoSQL was hosen be ause in this setting data are unstru tured and their amount

ould be enormous.

One of the main advantages of CS-NoSQL is that they are a full-sta k environment

(with other solutions we need to reate the entire infrastru ture) and this allows

reating a simpler appli ation with zero ode.

Goal of this master thesis is to verify the performan es of CS-NoSQL for the lass

of appli ations they are designed for, omparing them with a solution based on an

RDBMS.

To do that we implemented a omparing solution based on postgreSQL (an RDBMS

server) that repli ates data on lients using webso ket, we also implemented a simple

lo al database for it.

We reated a ustom framework to test this kind of systems and we did some ben h-

mark tests emulating the dierent lasses of appli ations for whi h CS-NoSQL are

designed.

We dis overed that they are very unstable with a reasonable amount of data and

that our proposed solution based on postgreSQL is qui ker (up to 10x). However

we expe t an improvement (in performan e and stability) in the next years when

probably more native solutions will be developed.

Sommario

Oggi, nelle appli azioni i dati res ono esponenzialmente e vanno repli ati su

diversi dispositivo in realtime, dispositivo he si dovrebbe poter utilizzare an he se

sono oine. Tali dispositivo ne essitano di una opia lo ale del database disponibile

an he oine hiamato database lo ale.

Il dispositivo tipi o he usa questo tipo di appli azione è lo smartphone, ma an he

appli azioni web ome i software ollaborativi ( ome Google Do s) le usano.

Sono state proposte diverse soluzioni basate sui database NoSQL (sia opensour e

he proprietarie in loud), ma an he soluzioni personalizzate basate su RDBMS sono

fattibili. Le soluzioni basate sui NoSQL non sono standardizzate e non hanno un

nome denito quindi le hiamiamo CS-NoSQL (NoSQL lato lient).

Sono stati s elti i NoSQL per il fatto he si trattano dati non strutturati e di grandi

dimensioni.

Uno dei prin ipali vantaggi dei CS-NoSQL è he sono un ambiente ompleto ( on

le altre soluzioni dobbiamo reare l'intera infrastruttura) e iò onsente di reare

un'appli azione più sempli e senza odi e.

L'obiettivo di questa tesi è veri are le performan e dei CS-NoSQL per le lassi

di appli azioni per ui sono progettati, omparandoli on una soluzione basata su

RDBMS.

Per far iò, abbiamo implementato una soluzione omparata basata su postgreSQL

(un server RDBMS) he repli a i dati sui lient usando webso ket, implementato un

sempli e database lo ale per esso, reato un framework personalizzato per testare

questi sistemi e fatto test di ben hmark emulando le dierenti lassi di appli azioni

per ui i CS-NoSQL sono stati progettati.

Abbiamo s operto he sono molto instabili on gran quantità di dati e he la nostra

soluzione basata su postgreSQL è più velo e (no a 10 volte). Tuttavia, i aspettiamo

un miglioramento (in performan e e stabilità) nei prossimi anni quando saranno

probabilmente sviluppate soluzioni più native.

Contents

Introdu tion 1

1 Ba kground 5

1.1 HTTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.1.1 HTTP handshake . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.1.2 WebSo ket . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1.3 RESTful . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2 Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.1 Data retrieve . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.2 Consisten y, Partition toleran e, Availability . . . . . . . . . . 10

1.2.3 Partitioning and Distribution . . . . . . . . . . . . . . . . . . 11

1.3 Serverless . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.3.1 Platform as a Servi e . . . . . . . . . . . . . . . . . . . . . . . 14

1.3.2 Software as a Servi e . . . . . . . . . . . . . . . . . . . . . . . 14

2 State of Art 15

2.1 CS-NoSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.1 Data model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.1.2 Distributed issues . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.1.3 Publish/subs ribe . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1.4 Constraints, permissions and queries . . . . . . . . . . . . . . 19

2.2 Client Best Pra ti es . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.1 Event driven approa h . . . . . . . . . . . . . . . . . . . . . . 20

2.2.2 Client language . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.3 RESTful approa h . . . . . . . . . . . . . . . . . . . . . . . . 22

2.2.4 Lo al database . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Analysis of Some CS-NoSQL 25

3.1 Chara teristi s to be onsidered . . . . . . . . . . . . . . . . . . . . . 25

3.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.1 OpenSour e . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.2 Sowftare as a Servi e . . . . . . . . . . . . . . . . . . . . . . . 29

i

Contents

4 A Proposed Comparing Traditional Approa h 31

4.1 Te hnologies Ba kground . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.1 PostgreSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.1.2 Redis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.1.3 So ket.io . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.2 Ar hite ture Proposed . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2.1 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.2 Publish/Subs ribe and Webso ket server . . . . . . . . . . . . 34

4.2.3 Input server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.2.4 Custom logi . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.2.5 Client library and Lo al database . . . . . . . . . . . . . . . . 36

4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.3.1 Webso ket server . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3.2 Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5 Classes of Appli ations 39

5.1 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.1.1 Realtime hat . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.1.2 Collaborative software . . . . . . . . . . . . . . . . . . . . . . 41

5.1.3 So ial Appli ations . . . . . . . . . . . . . . . . . . . . . . . . 43

5.2 Examples of Real CS-NoSQL Appli ations . . . . . . . . . . . . . . . 45

5.2.1 Adobe DPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.2.2 Logite h Harmony Ultimate Home . . . . . . . . . . . . . . . 46

5.2.3 CornerJob . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 Ben hmarks strategy 49

6.1 S aling Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.1.1 S aling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.1.2 Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.2 Test Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.3 Tests sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.3.1 Realtime hat . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.3.2 Collaborative software . . . . . . . . . . . . . . . . . . . . . . 54

6.3.3 So ial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.4 Adapters for systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7 Ben hmarks 57

7.1 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

7.1.1 Cou hbase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

7.1.2 Pou hdb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

7.1.3 Gun.js . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

ii

Contents

7.1.4 Traditional . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7.2 Analysis of the Results and Lesson Learned . . . . . . . . . . . . . . 70

8 Con lusions 73

Bibliography 75

A Snippets 85

A.1 PostgreSQL realtime retrieve trigger . . . . . . . . . . . . . . . . . . 85

A.2 So ket.io ustom logi . . . . . . . . . . . . . . . . . . . . . . . . . . 86

B Jepsen 87

iii

Contents

iv

List of Figures

1.1 Modern web appli ation sta k . . . . . . . . . . . . . . . . . . . . . . 6

1.2 HTTP1.0 vs HTTP1.1 (usingkeep-alive) [42 . . . . . . . . . . . . 7

1.3 HTTP2 multiple parallel requests [44 . . . . . . . . . . . . . . . . . 8

1.4 Webso ket with server events [87 . . . . . . . . . . . . . . . . . . . . 8

1.5 Database triangle [24 . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1 Simple lient ar hite ture . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1 Traditional Ar hite ture Proposed . . . . . . . . . . . . . . . . . . . 33

7.1 Chat ( ou hbase) ben hmarks . . . . . . . . . . . . . . . . . . . . . . 60

7.2 Collaborative ( ou hbase) ben hmarks . . . . . . . . . . . . . . . . . 61

7.3 So ial ( ou hbase) ben hmarks . . . . . . . . . . . . . . . . . . . . . 61

7.4 Chat (pou hdb) ben hmarks . . . . . . . . . . . . . . . . . . . . . . . 63

7.5 Collaborative (pou hdb) ben hmarks . . . . . . . . . . . . . . . . . . 64

7.6 So ial (pou hdb) ben hmarks . . . . . . . . . . . . . . . . . . . . . . 64

7.7 Chat (gun.js) ben hmarks . . . . . . . . . . . . . . . . . . . . . . . . 66

7.8 Collaborative (gun.js) ben hmarks . . . . . . . . . . . . . . . . . . . 67

7.9 So ial (gun.js) ben hmarks . . . . . . . . . . . . . . . . . . . . . . . 67

7.10 Chat (traditional) ben hmarks . . . . . . . . . . . . . . . . . . . . . 69

7.11 Collaborative (traditional) ben hmarks . . . . . . . . . . . . . . . . . 69

7.12 So ial (traditional) ben hmarks . . . . . . . . . . . . . . . . . . . . . 70

B.1 Jepsen laten y raw . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

B.2 Jepsen laten y quantiles . . . . . . . . . . . . . . . . . . . . . . . . . 88

B.3 Jepsen rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

v

List of Figures

vi

List of Tables

3.1 CS-NoSQL omparison . . . . . . . . . . . . . . . . . . . . . . . . 26

6.1 CS-NoSQL test environments . . . . . . . . . . . . . . . . . . . . . 51

6.2 postgreSQL (traditional approa h) test environments . . . . . . . . . 51

6.3 so ket.io (traditional approa h) test environments . . . . . . . . . . . 51

6.4 Realtime hat lients . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.5 Collaborative software lients . . . . . . . . . . . . . . . . . . . . . . 54

6.6 So ial lients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.1 Chat ( ou hbase) ben hmarks . . . . . . . . . . . . . . . . . . . . . . 60

7.2 Collaborative ( ou hbase) ben hmarks . . . . . . . . . . . . . . . . . 60

7.3 So ial ( ou hbase) ben hmarks . . . . . . . . . . . . . . . . . . . . . 61

7.4 Chat (pou hdb) ben hmarks . . . . . . . . . . . . . . . . . . . . . . . 63

7.5 Collaborative (pou hdb) ben hmarks . . . . . . . . . . . . . . . . . . 63

7.6 So ial (pou hdb) ben hmarks . . . . . . . . . . . . . . . . . . . . . . 64

7.7 Chat (gun.js) ben hmarks . . . . . . . . . . . . . . . . . . . . . . . . 66

7.8 Collaborative (gun.js) ben hmarks . . . . . . . . . . . . . . . . . . . 66

7.9 So ial (gun.js) ben hmarks . . . . . . . . . . . . . . . . . . . . . . . 67

7.10 Chat (traditional) ben hmarks . . . . . . . . . . . . . . . . . . . . . 69

7.11 Collaborative (traditional) ben hmarks . . . . . . . . . . . . . . . . . 69

7.12 So ial (traditional) ben hmarks . . . . . . . . . . . . . . . . . . . . . 70

vii

List of Tables

viii

List of Algorithms

2.1 Event driven retrieve of data . . . . . . . . . . . . . . . . . . . . . . 21

ix

List of Algorithms

x

Introdu tion

In today's appli ations, data are in reasing exponentially [108 and also need to

be repli ated on dierent devi es in realtime, these dierent devi es should be able

to use them even if they are oine. So the devi es need a lo al opy of the database

available also oine alled lo al database.

The typi al devi e that uses this kind of appli ation is a smartphone. But not only,

also other web appli ations use them, su h as ollaborative softwares (like Google

Do s) [112.

So what we need is a database whi h allows pro essing big data and repli ate itself

among devi es in realtime.

NoSQL databases were born to pro ess big data e iently [99,113. So solutions

to repli ate the databases in realtime are based on them, they an be opensour e or

proprietary in loud lo ated, one of the most famous loud solution is rebase [119.

These solutions are not standardized, so there is no name even for them so we all

them CS-NoSQL ( lient side NoSQL database).

Goal of this master thesis is to verify the performan es of CS-NoSQL for the lass

of appli ations they are designed for. Another obje tive is exploring issues related

to the development pro ess, s alability and other related issues.

We analyze the theory behind these databases, then we lassify and analyze the ones

whi h have signi ant hara teristi s. We hoose opensour e systems sin e we an

verify the hara teristi s de lared by vendors, we an take more grained measure-

ments of their performan es (also low level data) and the te hnology behind them is

more standard than the te hnology used by loud systems.

Moreover, we develop an alternative system based on a traditional database that

allows doing the same operations of native CS-NoSQL databases and we use it as a

omparative system.

In order to test these spe i databases we identify some signi ant lasses of ap-

pli ations where they should perform e iently, the lasses of appli ations dier for

elements that are typi al of this kind of appli ations su h as the number of lients

subs ribed, data stru ture and so on. Then we design a framework, sin e there is no

test framework to analyze this kind of appli ations whi h onsiders issues related to

1

Introdu tion

them su h as the number of lients subs ribed.

We prove that our lassi al omparative system is more e ient in terms of per-

forman e, but also in terms of s alability and stability (CS-NoSQL are very unstable

with a reasonable amount of data). In some ases, it is also 10x faster.

Besides CS-NoSQL are very e ient, in some ases, for development pro esses, in

fa t sin e they were born only for this purpose they are a full-sta k environment

(with other solutions we need to reate the entire infrastru ture) and this allows

reating a simpler appli ation with zero ode. This is very ommon proprietary in

loud lo ated systems but there is the disadvantage that it is a proprietary solution

and if the system goes down or the loud ompany fails all data are lost and there

is no way to repli ate the system in another server so a new development is needed.

So lots of CS-NoSQL issues are related to the fa t that solutions are proprietary, we

have issues like non standard way to run tests, non standard development pro esses

and so on. This is a big ost for ompanies that want to use this kind of systems sin e

they need a dierent training for ea h system used. Moreover, as we said previously,

they have not so impressive performan es. At the moment, only few ompanies use

them but we expe t big improvements on both main issues (performan es and devel-

opment pro ess issues) in the next years when, probably, more native solutions will

be implemented be ause the appli ations that need systems like that are in reasing.

Sin e CS-NoSQL are not standardized and are new, their literature is poor but

they are just a ombination of te hnologies already well-studied. The problems re-

lated to their use is that we have only information from vendors. So we hoose some

theoreti al information that are relevant and take these information from vendors (or

if they are not available we try to dedu t them from other sour es su h as the sour e

ode) for ea h CS-NoSQL that are relevant, both loud and opensour e. These infor-

mation in lude: onsisten y, data granularity, distribution and partitioning, se urity

issues and so on.

As said previously, we develop a omparing system that is based on a traditional

database using postgreSQL and lassi al ommuni ation methods based on web-

so kets.

We identify some interesting lasses of appli ations that dier for relevant har-

a teristi s that are typi al of this kind of appli ations. Even here, in order to do

that we use the information provided by the vendors.

We design a test strategy and develop a test framework sin e there are no test frame-

works for this kind of systems. This test framework allows testing the s alability and

performan es emulating the lasses of appli ations analyzed before and take indi es

of performan es that are very relevant for this kind of appli ations su h as time

needed to repli ate a new information to all the lients. Of ourse we measure also

lassi al measurements su h as raw read/write speed.

Finally, we exe ute some ben hmarks that are exe uted on identi al virtual ma-

2

Introdu tion

hines (distributed with the sour e ode of the test framework) to make tests repli a-

ble and make them in a standard way. We exe ute tests on dierent ombinations of

ma hine hara teristi s to see s alability and bottlene ks of the systems to be tested.

Thesis organization

In Chapter 1, we present the ba kground, all the knowledge that we need in order

to understand the next topi s. The main presented topi s are HTTP and webso ket

proto ol, basi on epts and issues of data in a distributed system (talking also about

publish/subs ribe and optimisti repli ation) and nally the on epts behind loud

omputing from the user's point of view.

In Chapter 2, we des ribe the state of art of CS-NoSQL. Of ourse, as we said

previously, there are no a ademi referen es to them, but referen es to te hnologies

used, so we show NoSQL databases from a theoreti al point of view with all the

related aspe ts that we need (su h as distribution). Then we show the best pra ti es

to implement the lo al database on the lient side.

In Chapter 3, we analyze some real CS-NoSQL, both opensour e and proprietary

in loud lo ated ones. We lassify them in a ommon way, as said previously, to

ompare them. Then we sele t a set of them to be used in the ben hmark tests.

In Chapter 4, we show our proposed omparing solution. We des ribe all the elements

to build it from the database to the lient su h as the system to deliver realtime

messages and so on.

In Chapter 5, we analyze the lasses of appli ations for whi h CS-NoSQL are designed

a ording to vendors. We analyze use ases, and for ea h of them we show a real

ase study.

In Chapter 6, we dene our strategy to run ben hmark tests. We analyze what we

need to do, what we need to measure, whi h tests we need to exe ute and we also

shortly explain the framework we have implemented.

In Chapter 7, we show the results of ben hmark tests and we omment them.

In Chapter 8, we present the on lusions of our master thesis and des ribe possible

improvements and future resear h.

3

Introdu tion

4

Chapter 1

Ba kground

In this hapter we show all the theoreti al knowledge needed to understand the

next hapters, in this hapter and in hapter 2 we explore all the theoreti al knowl-

edge to understand this thesis ontribution.

We analyze in 1.1 the HTTP, then webso ket proto ol (that is a proto ol based

on HTTP) that allows sending e iently messages in realtime to lients through

HTTP. The usage of a proto ol that uses HTTP is very important be ause lots of

appli ations of CS-NoSQL are web appli ations. In that se tion we analyze the entire

sta k of a lassi al HTTP appli ation, then we use it to introdu e and explain main

HTPP versions (1.0, 1.1, 2.0) then we des ribe the main HTTP based proto ols that

we use in this master thesis: webso ket (the most important one for this master

thesis' topi that allows sending realtime noti ations to lients) and RESTful (a

standard proto ol to ex hange messages between lient and server).

Then we des ribe basi on epts and issues of data management, in parti ularly

in a distributed system in 1.2, talking also about publish/subs ribe and optimisti

repli ation. We also analyze the database itself, be ause in traditional databases

there is no native event noti ation system: we have to he k it every time (polling) or

we have to reate a trigger to do that as we show in the next se tions. We also analyze

how to send realtime noti ations of hanges and the limits that are the auses

of the reation of CS-NoSQL. We analyze basi on epts of realtime data retrieve

(trigger, publish/subs ribe and high level steps needed), then we show the CAP

theorem (with eventual onsisten y and validity today with the new appli ations)

and nally distributed systems with related issues and basi on epts: distribution

& partitioning, repli ation (master-slaves & multi master) and standard framework

to elaborate data in a distributed environment su h as mapRedu e.

Finally we qui kly analyze on epts behind loud omputing from the user's point

of view in 1.3, sin e lots of systems that we analyze in the next hapters are in loud

lo ated.

After this hapter, we have all the elements needed to understand all the topi s

5

1. Ba kground

of this master thesis, ex ept for the topi s dire tly related to CS-NoSQL su h as of

ourse NoSQL databases that we show in the next hapter.

1.1 HTTP

There are a lot of ways to build an appli ation over HTTP, a web appli ation,

but all these ways have ommon parts.

Figure 1.1: Modern web appli ation sta k

In all the lassi al HTTP appli ations the lient requests something to the HTTP

server, then the HTTP server requests data to Database Server and nally the HTTP

server (appli ation server) sends data to the lient. As shown in gure 1.1.

In the rest of this se tion we des ribe the evolution of the HTTP that allows

sending realtime noti ations. The ore of the issue is the HTTP handshake ana-

lyzed below, in fa t we need to have asyn hronous requests, i.e. the server should be

able to send messages to the lient at any time [101.

In following se tions we present standards related to HTTP: webso ket (the most

important proto ol for this master thesis' topi that allows sending realtime noti-

ations to lients) and RESTful (a standard proto ol to ex hange messages between

lient and server).

1.1.1 HTTP handshake

HTTP 1.0 The original version of HTTP (HTTP 1.0) allowed just onne tions

that must be losed immediately after re eiving data [93, as shown in gure 1.2

(rst image): the lient requests something, then the server replies and nally the

onne tion is losed.

In this version the only approa hes available to make realtime noti ations are: short

polling (s heduling new requests every xed small amount of time to he k if data

are hanged), long polling ( reating a normal request to the server, but the server

replies only when there are new data, then the lient reates a new request) and

event stream (a long request without end, where the server ontinues to send data

to the lient, this approa h is used in video/musi streaming).

All the approa hes have a lot of issues that redu e the performan es [120.

6

1.1. HTTP

Figure 1.2: HTTP1.0 vs HTTP1.1 (usingkeep-alive) [42

HTTP 1.1 It is a new version of HTTP that xed some issues and improved some

things, it enabled the keep-alive option (this option was implemented uno ially by

a lot of HTTP 1.0 lients) [124.

The keep-alive option allows using the same onne tion multiple times, i.e. after a

server replies the lient an send another message over the same onne tion as shown

in gure 1.2 (se ond image).

This was originally designed to request dierent les over the same onne tion and

it onstitutes also the enabling te hnology for webso ket, as we see below.

HTTP 2.0 It is the newest HTTP standard (2015), it introdu ed a lot of improve-

ments. The most important are: the possibility for the server to push data that

the lient has not requested expli itly and the possibility to have multiple parallel

requests over a single TCP onne tion as shown in gure 1.3 [92. These features

allow also in reasing the energy e ien y [100.

But, sin e it is new, it is not supported e iently by a lot of browsers [43 and by a

lot of websites [140. So at the moment other solutions are preferred (like webso ket).

1.1.2 WebSo ket

It is a proto ol built over HTTP 1.1 that allows reating a sort of so ket where

both the lient and the server an send and re eive messages at any time [106. This

is done using an HTTP onne tion with the keep-alive option.

So this proto ol allows reating e ient realtime noti ations, where the lient re-

eives immediately a noti ation from the server (for server events) [128 as shown

in gure 1.4.

7

1. Ba kground

Figure 1.3: HTTP2 multiple parallel requests [44

Figure 1.4: Webso ket with server events [87

1.1.3 RESTful

As shown in gure 1.1 we an use RESTful all to retrieve data, RESTful is a

sort of proto ol to ex hange messages between the lient and the server over HTTP

(it uses standard HTTP methods) [142.

It is a simple proto ol that uses the HTTP verbs, for example to reate new data

a post all must be done, while to update data a put all must be done, this is

the CRUD (Create Read Update Delete) approa h [121. The idea is to make simple

a tions on resour es that return immediately an answer (like GET all) or to exe ute

immediately an a tion (like POST all).

There is another proto ol older than RESTful, still used: SOAP, it is very powerful

but in order to manage resour es RESTful is better [125, 129.

RESTful is designed to work with resour es, in fa t ea h resour e is identied by

a unique URI and there are standard a tions (HTTP verbs) for ea h resour e. For

example we an have:

Customers identied by htttp://example. om/ ustomers

Customer details with a unique ID (for example id = 10) identied by

8

htttp://example.com/customers

1.2. Data Management

htttp://example. om/ ustomers/10

* Conta ts of a ustomer identied by htttp://example. om/ ustomers/

10/ onta ts

It is useful to surf a ross a JSON stru ture as we see in se tion 2.1.1. RESTful

annot be used with webso ket, but it an be used for short polling (GET) even if

it is not properly a orre t approa h.

1.2 Data Management

In this se tion we shortly des ribe the basi on epts of database systems to

retrieve data in a realtime way, we analyze basi on epts su h as triggers and Pub-

lish/Subs ribe then the high level steps needed to retrieve data in a realtime way.

We des ribe basi on epts and issues of data in a distributed system: onsisten y,

partition toleran e and availability (CAP theorem), we also analyze useful variations

su h as eventual onsisten y and how the original CAP theorem is hanged with the

new appli ations.

Finally we show the general idea behind a distributed system, analyzing the mean-

ing and onsequen es of partitioning and distributing, standard framework su h as

mapRedu e and repli ation (master-slaves & multi master).

1.2.1 Data retrieve

Relational databases are not developed to send noti ations, the basi idea is:

when you want data you get them.

So in the traditional HTTP appli ations, as shown in gure 1.1 for ea h request the

HTTP server he ks the database server.

If we want to make this pro ess realtime we have to solve two main issues: how to

have a noti ation of a hange onsidering also distributed system ases? The trivial

solution is the usage of a database trigger as explained below and how to manage

e iently lients that want to have information about hanges of only some things

and not of everything? The solution is to use the publish/subs ribe pattern [136

as explained in the remainder of this se tion. Finally we re ap everything needed.

Database trigger A ording to SQL:1999 (known as SQL3) [139 a trigger is a

pro edure automati ally alled by the database when some spe i events o ur.

It has the following omponents: unique name, trigger event (insert, delete, update),

a tivation time (before or after the trigger event), trigger granularity (for ea h row,

for ea h statement), trigger ondition (SQL ondition), trigger a tions (SQL pro e-

dure to exe ute) and trigger timestamp (when the trigger was reated).

9

htttp://example.com/customers/10

htttp://example.com/customers/10/contacts

htttp://example.com/customers/10/contacts

1. Ba kground

If the event is red and the ondition is valid the trigger is alled, if there is more

than one trigger to be alled they are alled in timestamp as ending order.

Publish/Subs ribe In publish/subs ribe pattern (originally known as news sub-

system) ea h lient that is interested in some data/events subs ribes to these ones,

while the server servi e publishes data that will be sent to the lients (by the broker

server(s)) without knowing whi h are the lients subs ribed [94. Of ourse the pub-

lishers ould not be a server, in fa t every a tor an be a publisher.

Publish/subs ribe is not ompatible with RESTful alls sin e in the RESTful ap-

proa h data must be returned immediately when a request is done.

There are two types of subs riptions: topi -based ( hannels, i.e. a unique name

that identies the messages) and ontent-based (messages sent to subs ribers inter-

ested in some attributes or into their ontent).

The ontent-based system is more exible but an e ient distributed broker network

is not so easy to be developed [91.

Sin e we need just to re eive events of hanges on tables the topi -based is enough

where the topi ould be the ouple table-re ord_id as hannel.

Realtime retrieve Now we have all the elements to reate a realtime appli ation:

ea h lient subs ribes to data that it needs, a trigger on the server publishes data on

a publish/subs ribe system when they are updated and nally the ommuni ation

between the lient and the server is over a webso ket hannel (so immediately after

data are published the publish/subs ribe broker server sends data on this hannel,

as previously said RESTful annot be used).

Of ourse there are still issues related to the distribution and partitioning of the

infrastru ture.

1.2.2 Consisten y, Partition toleran e, Availability

The gure 1.5 shows the CAP theorem applied to the databases, it says that in

the distributed systems only two of the following three properties an be guaran-

teed [107, 113: onsisten y (all nodes re eive the most re ent data - write), avail-

ability (all requests re eive a response with data - this does not ensure that data

are the newest) and partition toleran e (the system works properly even if data are

partitioned among nodes, also if some of them are unrea hable). ACID (Atomi ity,

Consisten y, Isolation, Durability) databases (traditional relational databases), of-

ten, hoose availability and onsisten y [96.

Of ourse if there are no network errors all properties hold.

In the distributed systems the main issue is the syn hronization among nodes: there

is no ommon lo k, so the order of ommits is not so easy to be determined.

10

1.2. Data Management

Figure 1.5: Database triangle [24

Eventual onsisten y When strong onsisten y is not implemented, often even-

tual onsisten y is implemented: the storage system guarantees that if no new

updates are made to the obje t, eventually all a esses will return the last updated

value [141.

Today validity Today there are new aspe ts to onsider like partitioned luster

over WAN, that an ause dierent issues su h as high laten y, so CAP theorem is

not enough to lassify these new situations. New lassi ations are proposed like

PACELC to solve these issues [90.

The idea behind PACELC (Partition Consisten y Availability Else Laten y Con-

sisten y) is that if strong onsisten y must be guaranteed the laten y an in rease a

lot (time needed to re eive all the a ks from all repli a for every a tion), potentially

to innity. So a system has to hoose between low laten y and strong onsisten y.

But sin e this lassi ation is not used yet by a lot of ommer ial and non systems in

the next hapters CAP lassi ation is used, sin e using PACELC would ause the

lassi ation in onsistent (sin e some systems are not lassied under it). Moreover

some ommer ial systems do not give further details to make this lassi ation by

yourself.

1.2.3 Partitioning and Distribution

There are some systems that use both ideas. For example HDFS (a distributed

le system) repli ates data among nodes and it partitions data among nodes [135.

So a hunk of data is repli ated some times but it is not present in all nodes. A

11

1. Ba kground

group of nodes is alled luster.

To elaborate data in a distributed environment one simple, powerful and ommon

programming model is: MapRedu e.

The two main patterns to stru ture a distributed repli ated ar hite ture are: multi

master and master-salves.

Partitioning & Distribution

Partitioning Partitioning data among dierent nodes means to split data into

hunks a ording to some dened rules and to put dierent hunks into dierent

nodes [98, 113.

Distribution The distribution is the repli ation of the infrastru ture among

dierent nodes, ea h node does the same thing. It is not a so easy on ept, in fa t

there is the syn hronization problem (formally known as onsisten y as shown by

the CAP theorem) [107 for example in the following s enarios.

Commit data in a distributed database, the problem is to nd the orre t ommits

order sin e there is no ommon lo k, we should also provide a distributed lo k.

Or Publish/subs ribe in a publish/subs ribe appli ation distributed among dierent

nodes, how are published data in a node sent to all the subs ribed lients also the

ones subs ribed on other nodes?

MapRedu e MapRedu e is a programming model where data are split among

dierent nodes and are elaborated by dierent nodes [105. It is omposed by three

main a tions: map (ltering and sorting data), shue (grouping data by key and

sending data with the same key to the same redu er node) and redu e (summary

operations).

Multi Master & Master-Slaves repli ation To have data immediately avail-

able (optimisti repli ation) we have to renoun e to strong onsisten y having even-

tual onsisten y.

There are dierent ways to solve oni ts and to obtain eventual onsisten y. One of

the simplest ways to solve oni ts is Last Writes Wins, in this way the write with

the highest timestamp is used overwriting the on urrent others [133.

In order to keep strong onsisten y we need a distributed lo k that auses delays,

problem shortly.

Multi Master All data an be written in every node of the luster that repli-

ates them to other nodes [51. Of ourse in this kind of ar hite ture there an be

issues due to the CAP theorem.

12

1.3. Serverless

Master-Slaves All data are written in one node that repli ates them to the

others that an be used only in read only mode [49. Of ourse in this kind of

ar hite ture there are less issues than all the issues explained previously. There an

still be issues due to availability as explained by the CAP theorem. If the repli ation

is multi-master it an be also master-slaves.

1.3 Serverless

A serverless is a loud omputing ar hite ture, where the developer does not have

to think about the infrastru ture. It is something half-way between Platform as a

Servi e and Software as a Servi e, for that reason it is also known as Fun tion as a

Servi e.

The serverless idea is a general idea that an be implemented for dierent ases, a

lassi al implementation is for le systems [95.

In serverless ar hite ture the developer has to think only to develop the appli a-

tion without thinking to how to s ale the infrastru ture, how to store data in a se ure

way and so on. In serveless ar hite ture the developer does not have to develop a

real server, the server is already set, he has only to ongure it and eventually to

develop some small extensions (in the database ase the extension an be simply the

triggers).

Like Fun tion as a Servi e suggests we an also have systems where we have to de-

velop just a pie e of ode, a fun tion, whi h is exe uted in an unknown infrastru ture,

where we do not have to think about s alability, we pay just for usage. For example

we pay the number of exe utions of that fun tion, this is the way used by Amazon

AWS implementation: AWS Lambda [2, 3. This is a big advantage for the user,

be ause there is an instant s alability that there is not on Platform as a Servi e.

As we see on 3.2 this kind of ar hite ture is entirely developed on loud ar hite -

tures and partially developed on opensour e ar hite tures, in opensour e ar hite -

tures we have the fa ility to ongure and develop the server but we do not have an

easy s alability like the loud systems.

After we see the se tions below we know how the dieren e among these three

kinds of ar hite tures is minimal and that often there is overlapping among the ar-

hite tures.

An example of overlapping is salesfor e. om, a CRM ( ustomer relationship man-

agement) servi e born like a Software as a Servi e [103. But it has be ome a sort

of serverless ar hite ture to build business appli ations, it has also bought heroku (a

Platform as a Servi e ompany). However, even if now it is like a serverless servi e,

the pri e system has remained the same as that of Software as a Servi e system,

where you do not pay for the usage, you pay for a ounts and ea h a ount has some

limits [78.

13

1. Ba kground

1.3.1 Platform as a Servi e

Platform as a Servi e is a loud omputing ar hite ture where the developer does

not have to think about the infrastru ture. He hooses the platform where to develop,

that is automati ally ongured and ready to use [117. In this kind of ar hite ture

the developer has to hoose how to s ale (it an be an automati pro ess) but in an

easy way, often he has only to hoose the number of nodes.

One of the most known Platform as a Servi e servi e is heroku [123, omparing

the pri e system of heroku [41 with the pri e system of AWS Lambda [3 we an

observe that in heroku we pay for the number of nodes per se ond, while in AWS

Lambda we pay just for the exe ution time. In AWS Lambda we do not have to do

anything to exe ute just one fun tion per se ond or thousands per se ond, while on

heroku we have to set the right number of nodes or to set a orre t strategy to dy-

nami ally reate/remove nodes. So the serverless approa h an rea h the maximum

grade of sharing resour es in loud omputing.

For what on erns the on ept of platform ready to use to develop the serverless

ar hite ture is similar to Platform as a Servi e but it is dierent for other aspe ts

like s alability as we have seen with AWS Lambda example.

1.3.2 Software as a Servi e

Software as a Servi e is a loud omputing ar hite ture where there is no need or

possibility to develop anything [103. One lassi example is a webmail servi e, that

an be ongured for a private ompany.

One of the most known Software as a Servi e servi e is google apps for work [31, a

servi e where the employees of a ompany have a ess to the ompany mail, ompany

do s, ompany loud storage and so on. Everything is done only by onguring a

system without developing anything. The pri e system of google apps for work is a

system where the ompany has to pay for the number of users not for the usage [32,

an approa h ompletely dierent from other loud systems, but it is a good approa h

for ompanies be ause in that way they are able to predi t the ost in a reliable way.

The serverless approa h is very similar to Software as a Servi e sin e the developer

does not have to think about the infrastru ture or how to build a reliable server, but

instead of the Software as a Servi e in the serverless ar hite ture the developer an

develop something, not only ongure.

14

Chapter 2

State of Art

In this hapter we analyze CS-NoSQL databases and lient best pra ti es. In the

previous hapter we have analyzed all the elements needed to understand these topi s,

we have also seen the limits of traditional te hnologies whi h led to the reation of

CS-NoSQL. In fa t the CS-NoSQL allow in reasing performan es (theoreti ally) and

sending realtime noti ations to lients easily without implementing manually the

entire sta k.

As we show the on epts behind CS-NoSQL are well-known on epts with a lot of

a ademi referen es, but the idea behind CS-NoSQL that joins all of these knowledge

is not. In fa t, it is based on the ommon te hnologies best pra ti es.

We analyze the key aspe ts of CS-NoSQL in 2.1, their relationship with NoSQL

databases. We analyze the main data models, distributed issues, publish/subs ribe

applied to CS-NoSQL with their advantages and other non trivial aspe ts su h as

onstraints, permissions and queries.

Then we des ribe the best pra ti es from the lient to re eive noti ations (using

publish/subs ribe) in a transparent way for the nal user's point of view in 2.2.

We show event driven approa h, an analysis of the lient language used, RESTful

approa h and last but not least the lo al database.

This hapter is needed together with the previous hapter to understand and to

lassify the real systems explained in 3. Moreover, it is very useful as based-point

from a theoreti al point of view for the omparative approa h we proposed in 4, in

parti ular we refer to the lient best pra ti es.

2.1 CS-NoSQL

They are improperly alled realtime databases even if they are not realtime

databases (a ompletely dierent thing), but this name is used for ommer ial pur-

poses sin e they send realtime noti ations of hanges in data.

These databases have some advantages. They an be easily partitioned and dis-

15

2. State of Art

tributed over dierent nodes, an important property. Moreover they easily allow

subs ribing to dierent granularities of data and to re eive noti ations of hanges

using publish/subs ribe. This is done without writing any line of ode in the server.

They implement also the publish/subs ribe broker server. Furthermore they an

have partially support to onstraints, permissions and queries support. Also they

an be easily implemented in a serverless approa h with real databases. It is an e-

ient approa h that an also give a lot of e onomi advantages in dierent situations.

Finally they implement lient best pra ti es providing all libraries/framework.

It is easy to see why these databases are implemented over NoSQL databases.

In fa t a NoSQL database is an unstru tured database, it an support a subset of

the SQL instru tions (but it is not mandatory).

They were born to reate s alable databases for Big Data (a set of data that are

too large or too omplex to be managed/elaborated with a traditional system) ap-

pli ations. Even this means losing query expressiveness power, in fa t they an have

multiple masters support [98.

They an be easily partitioned but in order to ensure this e iently they have to

renoun e to the onsisten y or to the availability as shown in the gure 1.5, we show

more details analyzing real appli ations in 3.

Data are stored in unstru tured formats. We des ribe two main formats: key-value

and JSON, we use JSON when possible in our appli ations, but as we see key-value

is very important in some appli ations and it is a sort base of JSON for some aspe ts.

Then we analyze distributed issues with pros and ons, we analyze the CAP theorem

and distribution and partitioning applied to CS-NoSQL.

Moreover we show publish/subs ribe implemented in these databases2.1.3.

Finally non trivial tasks and issues su h as onstraints, permissions and queries.

2.1.1 Data model

In NoSQL databases data are organized in olle tions [113, they are like the

tables of traditional databases.

Ea h olle tion an be organized a ording to dierent ways, we see the two most

used ways: key-value and do ument, for do ument we analyze the JSON that is one

possible format for do ument type.

Ea h do ument is something that an ontain data in JSON format or XML format,

le and so on.

Key-value Key-value is one of the simplest ways to store unstru tured data. It is

also known as di tionary or hash [113. The basi idea is that we have unique keys

and for ea h of them we have a linked value, i.e. a do ument that an be everything:

a le, a simple type, a omplex type (like JSON) and so on.

The simpli ity of this s hema is its key of su ess, in fa t it is used in a lot of NoSQL

16

2.1. CS-NoSQL

systems.

If we want to apply publish/subs ribe we an easily use topi -based publish/subs ribe

using the ouple olle tion-key as hannel. So the only level of granularity is the value

(low granularity), we do not inspe t it.

JSON JSON (JavaS ript Obje t Notation) is an easy standard to store data [102,

it is not so powerful like XML but it is simpler so the omputation is faster [127. It

ould be used as model in NoSQL databases do ument.

On listing 2.1 there is a simple example of a JSON stru ture.

1

2 "main_array":[

3 [

4

5 "title": "Element 11"

6 ,

7


9

10 ,

11 [

12


14 ,

15


17

18

19

20

Listing 2.1: Simple JSON example

On the JSON stru ture there an be simple types, array or obje ts, in the previous

example (listing 2.1) we have:

main obje t

an array ( alled main_array)

* two arrays

· two obje ts (with title property)

So a NoSQL database that uses JSON as do ument is omposed by a JSON data like

that (a main JSON obje t). This stru ture is a very powerful stru ture that allows

storing a lot of omplex stru tures.

A simple example an be a system where we have some users, ea h of them with a

17

2. State of Art

omplex stru ture, ea h user is stored as obje t in a main array.

The RESTful is a good proto ol for JSON and, in general, for NoSQL databases [132.

It is easy to see how this stru ture an be useful to subs ribe only to a portion of

data, to a path (like /main_array/0/). For example using a stru ture like the one

used in listing 2.1, we an subs ribe to the se ond element of the main array (an

inner array) then we re eive noti ations for hanges of it (or other inner elements

also omplex elements). So we an have a high granularity, dierent systems have

dierent limits on max number of levels allowed.

If we want to apply publish/subs ribe we an easily use topi -based publish/subs ribe

using the ouple olle tion-path as hannel.

We an also see that the expressiveness power is the same of key-value sin e we an

onsider path as our key (of ourse the stru ture of data stored is dierent), but it

is more readable.

We analyze better what it means to subs ribe to a spe if path of the JSON stru ture

below.

We an see that if we an subs ribe only to the rst level of the main obje t, the

nal result is the same of a key-value stru ture.

2.1.2 Distributed issues

As introdu ed at the beginning NoSQL databases an be easily distributed and

partitioned. We obtain an in rease of performan e from distribution and partitioning

but, often, we have to renoun e to strong onsisten y.

These databases, often, implement MapRedu e to elaborate distributed data e-

iently [114.

CAP theorem The CAP theorem is one of the fundamental theorems for dis-

tributed databases.

Sin e the main hara teristi of NoSQL databases is the s alability they need parti-

tion toleran e. In fa t NoSQL, often, hoose availability and partition toleran e or

onsisten y and partition toleran e [111, 113. In 3 we analyze NoSQL databases of

both types.

Sin e, often, they do not implement a strong onsisten y an eventual onsisten y is

implemented.

Distribution and partitioning Distribution and partitioning introdu e dierent

issues.

But in NoSQL databases also the master node an be distributed, so the lient

an ommit on dierent nodes. This in reases the performan e, sin e there are no

saturation issues of the server or of the lo ks needed to write data ( ommit), but it

ould reate problems in transa tion onsisten y [98.

18

2.1. CS-NoSQL

2.1.3 Publish/subs ribe

Topi -based subs ription is enough. The ommon idea in all of these databases

is to subs ribe to the main events (the same events of a database trigger explained

in 1.2.1): inserted, updated, deleted.

We subs ribe to these events for dierent levels of granularity, that depends on data

model used and on the database itself. For example there are some databases that,

even if they support JSON, allow subs ribing only to the rst level of it (so it is like

a key-value).

Of ourse, as introdu ed at the beginning, the me hanism to subs ribe to these data

and the trigger that publishes them are developed inside these databases. So no

other development is required, the system is ready to publish hanges. The delivery

an be implemented using dierent te hnologies that we have analyzed in 1.1.

Of ourse we still have issues due to the fa t that we have to s ale also the bro-

ker servers and as previously said it is not an easy task (even if for topi -based is

easier than for ontent-based). Fortunately these databases provide an integrated

publish/subs ribe that s ales with the database itself.

2.1.4 Constraints, permissions and queries

As we see with real databases onstraints on data, permissions and queries are

not a trivial task, espe ially in CS-NoSQL. In fa t we have to repli ate them on the

lient.

Constraints and permissions In dierent systems more than one user an a ess

to the same JSON obje t (do ument) so we need ne grained permissions (it an

be onsidered as a onstraint). Moreover we ould want to have some traditional

onstraints like integrity onstraint, data type onstraint and so on. So we see ev-

erything as a onstraint.

Dierent systems implement dierent onstraints sin e they an also modify perfor-

man es of the system [98.

Queries There are no standards for queries in NoSQL databases, sin e they de-

pend on dierent elements: data model used, database used, MapRedu e with query

support and so on [114. Often key-value database allows having more powerful

queries.

There are some languages (pig and hive) built on top of MapRedu e that allow doing

queries in standard ways [126.

19

2. State of Art

2.2 Client Best Pra ti es

In this sub se tion we analyze the best pra ti es used in these kinds of appli ations

from the lient's point of view. Some hoi es, like data returned in some ases, an

be imposed by the server te hnology hosen. In gure 2.1 a simple ar hite ture is

shown.

Figure 2.1: Simple lient ar hite ture

Of ourse, sin e there is no standard, any appli ation implements dierent things

and/or implements them in dierent ways. So this is just a summary of the most

used best pra ti es.

Note that with appli ation server (then server) we mean the CS-NoSQL with inte-

grated publish/subs ribe broker server.

We an summarize them in some points. The lient does not onta t dire tly the

appli ation server but sends all the requests to the lient framework. It re eives

immediately a response with an eventual onsisten y logi (a read does not return

a data older than a previous read), but of ourse data annot be updated with

the latest of the server. Furthermore the server sends asyn hronous noti ations of

hanges, the lient at hes them using an event driven approa h. The lient spe -

ies whi h data he needs, so he re eives noti ations only for them. Moreover the

lient uses a language that easily allow working with event driven approa h. Also

the lient framework uses RESTful to send/re eive syn hronously data. Finally the

lient framework implements a lo al database, it stores all data written by the lient

and all data re eived by the server via asyn hronous noti ations. This allows re-

turning data immediately when requested by the lient.

All things allow using e iently publish/subs ribe to re eive data in a transparent

way for the nal user.

2.2.1 Event driven approa h

On algorithm 2.1 we show a ommon and simple realtime retrieve of data in

the event driven approa h. The basi idea of event driven approa h is to all a

allba k when an event is red [104, a known example of event driven programming

is programming for desktop interfa es where the events are the user inputs.

Of ourse this is not the only approa h available but it is easy to understand how

it is very useful in asyn hronous appli ations like CS-NoSQL appli ations that we

20

2.2. Client Best Pra ti es

dis uss in this thesis.

Algorithm 2.1 Event driven retrieve of data

1: db← onne t(DB_ADDRESS) ⊲ Conne t to DB

2: document← sele tDo ument(db,DOCUMENT ) ⊲ Sele t do ument

3: onChange(document, PATH,CALLBACK) ⊲ Subs ribe to hange events

4: pro edure allba k(newVal, oldVal)

5: log(oldV al) ⊲ Log new value of the path

6: log(newV al) ⊲ Log old value of the path

7: end pro edure

The basi idea is to subs ribe a allba k to hange events of a spe if path of

a do ument, thinking of JSON do uments the path is a referen e to a spe i

level of the JSON do ument. For example a path for the JSON listing 2.1 an

be /main_array/0/ to subs ribe to hange events of the rst array inside the

main array.

On the allba k the entire new path an be passed or only the part hanged, in the

ase of an array only the hild hanged.

2.2.2 Client language

One of the most used languages in this kind of appli ations is E maS ript 5.0

(in some ases E maS ript 6.0), ommonly known with its diale t name: javas ript.

In fa t the most known E maS ript interpreter: V8 is very e ient for event driven

programs, it is the interpreter used by the desktop porting that we use: nodejs [137.

This e ien y is due to a good Just in Time ompiler written in C [138. All server

te hnologies hosen analyzed in the next hapters have E maS ript lients, of ourse

we an reate lients in other languages sin e the proto ols used are standard and

opened. E maS ript is a language where event driven approa h is easy to implement,

in fa t the language has the fundamentals to exe ute the asyn hronous ode sin e it

has the allba ks [101, 109.

Moreover E amS ript is exe uted in only one thread, also the allba ks are exe uted

in this thread. But some operations like network requests, database onne tions

and so on are exe uted by other threads in ba kground (the result is passed to the

allba k alled in the main thread). If there is no CPU intensive ode, this approa h

is very e ient and solves problems due to ra e onditions [137.

Nodejs has also a standard pa ket manager alled NPM, with a lot of libraries, this

allows writing small examples with only the ode needed to understand, the other

parts are done by the libraries.

Furthermore JSON, as the a ronym suggests, is derived from javas ript. In fa t

the javas ript an parse it easily and JSON obje ts/arrays be ome javas ript native

obje ts/arrays. So we an a ess to them in a native way without alling parsing

21

2. State of Art

methods to iterate the stru ture or without mapping the JSON into already dened

lasses (i.e. deserialize it to obje ts), like other languages do like JAVA [134, in fa t

both approa hes are not easily adaptable to stru ture hanges.

2.2.3 RESTful approa h

What we said previously for path is valid also for RESTful URI. RESTful is

useful to post/put/delete data and to retrieve them syn hronously, syn hronous re-

trieve an be useful to populate lo al databases at the beginning, in fa t RESTful is

not ompatible with asyn hronous requests. There is no reason to exe ute this kind

of operations in an asyn hronous way, of ourse they an be done via a webso ket

onne tion, but they are done in a sort of syn hronous way.

So the best pra ti e is to use RESTful for all the syn hronous operations, let the

possibility to use RESTful get all data or a portion of them, but give also an asyn-

hronous interfa e integrated with the event driven approa h like webso ket.

Of ourse sin e JavaS ript has asyn hronous operations support the result of these

syn hronous operations is returned via a allba k. These operations are syn hronous,

in the meaning that after the request is sent a response is returned immediately, but

there are network delays, for that reason the ode is exe uted in an asyn hronous

way [109.

2.2.4 Lo al database

The best pra ti e is to reate a lo al database on the lient repli ating server data

(with only the data needed, what the lient want) using optimisti repli ations [133,

this allows having what ommer ially is alled optimisti UI [52.

It gives the ability to update the lo al database (and onsequentially the user inter-

fa e) even if there are network delays or if the network is down.

So the developer an all all the methods (get data, put data and so on) on the

database, seeing the ee t, even if there is no onne tion (often alled oine mode).

Of ourse some server onstraints/modi ations or also permissions onstraints are

applied when the network omes ba k up. Some of the onstraints/modi ations

applied by the server an be implemented in the lient su h as type onstraints, in-

tegrity onstraints and so on. Of ourse the server re he ks everything again.

The lient ould also implement the query logi . But for simpli ity, often, this kind

of systems have simple query support or no query support (neither lient nor server).

Coni ts an o ur and dierent ways exist to solve them [133.

One of the simplest ways to solve oni ts is Last Writes Wins, in this way the

writing with the highest timestamp is used overwriting the on urrent others. This

approa h is used by dierent systems in the implementation of lo al database, the

timestamp used is the one of the server when the message is really re eived by the

22

2.2. Client Best Pra ti es

server (not the update time by lient that an be very old due to network delays). Of

ourse this approa h is reliable when there are masters/supernodes, in other ases

te hniques that take in onsideration onsensus are needed.

23

2. State of Art

24

Chapter 3

Analysis of Some CS-NoSQL

In this hapter we analyze some ommer ial CS-NoSQL both opensour e and

proprietary in loud lo ated. We sele t some of them for a future analysis, all the

systems sele ted are open sour e systems sin e we have mu h more ontrol and we

an do better tests. The set is omposed by CS-NoSQL with dierent hara teristi s

su h as dierent CAP, dierent data stru tures and so on. These hara teristi s

are based on theoreti al on epts we have seen in hapter 2. Some of the systems

analyzed in this hapter are the same used in the hapter 5 to dene important

lasses of appli ations. The CS-NoSQL sele ted for future analysis are the ones we

use in the test ben hmarks in hapter 7.

In this hapter we des ribe the main hara teristi s to onsider in 3.1. In that

se tion we also show a re ap of the main hara teristi s of the CS-NoSQL sele ted.

Then we analyze all the hara teristi s des ribed before for some CS-NoSQL in 3.2.

We do a better analysis with all the hara teristi s for systems that we sele t, for

others we do only a simple des ription with the main hara terizing features.

3.1 Chara teristi s to be onsidered

We onsider NoSQL databases with the following hara teristi s: JSON as data

model to have a exible standard that is easy to use, realtime noti ations sup-

port with publish/subs ribe (of ourse this an be provided by other plugins)and a

javas ript library sin e it is the language that we use.

Then we keep in onsideration, to distinguish dierent systems, dierent hara -

teristi s. We onsider data granularity for subs riptions, even if data are stored in

JSON we an have a granularity only to the rst level of the JSON obje t (it is like

key-value) or have a key-value stru ture where the value is JSON. Of ourse we ana-

lyze also the lassi ation a ording to the CAP theorem, of ourse onsidering also

eventual onsisten y if available, this is done only for the server distribution, we do

not have enough information to do this lassi ation for the lo al database (generally

25

3. Analysis of Some CS-NoSQL

it implements eventual onsisten y). But we keep in onsideration also: distribution

(with MapRedu e support) and partitioning, lo al database implemented, repli ation

model (multi-master or master-slaves), proto ol used to send noti ations, proto ol

used to send data onstraints and permissions support (with user management) and

query support. This detailed analysis is done only for systems that we test in the

next hapters.

As previously explained the CAP theorem is not enough for a lot of aspe ts but,

as explained, today is not possible to make a lassi ation under other systems for

a lot of systems.

In table 3.1 we show the main hara teristi s of the databases sele ted for a further

analysis in the next hapters, the sele tion is done taking systems with dierent

hara teristi s.

Table 3.1: CS-NoSQL omparison

Database CAP Lo al DB MapRedu e Repli ation User management Noti ation Interfa e Data granularity Queries

Cou hbase CP No Yes Master slave Yes All Proprietary Key-value Yes

Pou hDB AP Yes Yes Multi master Yes Long polling RESTful Key-value Limited

Gun.js AP Yes No Multi master Yes Webso ket RESTful Fine grained No

3.2 Analysis

Even if this kind of systems are young a lot of systems have already been reated,

there are both opensour e and SaaS systems.

Some SaaS systems are very famous and used but they do not allow doing all tests

and performan e measurements needed for our study, sin e we do not know the

internal stru ture and repli ation, so we analyze only opensour e systems. Of ourse

we qui kly analyze the hara teristi s of the main SaaS systems.

We analyze some opensour e systems, some of them are analyzed qui kly be ause

are not studied in the next hapters but they are well-known or they have parti ular

hara teristi s. Then we analyze some SaaS systems that are very used, they are not

studied in the next hapters.

Note that the presented information are retrieved from data shown on the o ial

sites of the produ ts, but, in most ases, they are not veried empiri ally. Moreover

the term realtime is used improperly.

3.2.1 OpenSour e

We analyze some open sour e systems, for systems that we sele t for ben hmarks

we do a omplete analysis. Analyzing every important point previously explained.

For the other we only say hara terizing features, in fa t these systems do not have

good performan es or they la k of some needed features for our tests.

26

3.2. Analysis

Cou hBase Cou hBase

1

is a NoSQL database with realtime extension alled syn -

Gateway

2

(every a tion must be done passing through this gateway). It is one of

the most famous open sour e systems, for that reason we test it.

A ording to the hara teristi s previously des ribed, using syn Gateway + ou h-

base, we have:

Data granularity for subs ription: key-value [15.

Lo al database: none.

CAP: CP ( onsisten y and partitioning) [6.

MapRedu e: MapRedu e support [10.

Repli ation model: master-slaves repli ation support [12.

Proto ol used for noti ations: webso ket and all the ways explained in 1.1 to

re eive events [14.

Proto ol used to send data: proprietary.

Constraints and permissions support: user management [13.

Query support: N1QL, a super set of SQL to query JSON [11.

Other interesting features are: powerful luster onguration [7 and full text sear h

[9 i.e. full text allows you to sear h and nd what you are looking for even without

exa t mat hes. Just like the LIKE keyword in SQL? Not really. It is something

else. LIKE allows the use of wild ards, whi h is quite dierent. This means it is

ase insensitive, it an ignore unimportant words like 'is' (stop word is the te hni al

term), and is tolerant to mistakes like typos..

Pou hDB Pou hDB

3

is just a javas ript library that intera ts with a NoSQL

database: Cou hDB, sin e Pou hDB uses a stable and used system like Cou hDB

is a system that we test. Pou hDB, in addition to the interfa e, implements also a

lo al database.

Cou hDB Cou hDB

4

is one of the most famous NoSQL databases and it is

very simple.

A ording to the hara teristi s previously des ribed, using Pou hDB + Cou hDB,

we have:

1

http:// ou hbase. om

2

http://developer. ou hbase. om/do umentation/mobile/1.1.0/get-started/

syn -gateway-overview/index.html

3

https://pou hdb. om/

4

https:// ou hdb.apa he.org

27

http://couchbase.com

http://developer.couchbase.com/documentation/mobile/1.1.0/get-started/sync-gateway-overview/index.html

http://developer.couchbase.com/documentation/mobile/1.1.0/get-started/sync-gateway-overview/index.html

https://pouchdb.com/

https://couchdb.apache.org


Data granularity for subs riptions: key-value [15.

Lo al database: Pou hDB [64, 65.

CAP: AP (availability and partitioning) [16, 21, but eventual onsisten y is

implemented [17.

Mapredu e: MapRedu e support [18.

Repli ation model: multi master repli ation support [19.

Proto ol used for noti ations: long polling to re eive events [15.

Proto ol to send data: RESTful native interfa e [20.

Constraints and permissions support: simple user management [22.

Query support: limited [23.

These features allow interfa ing to it e iently and in a realtime way [38. The only

negative aspe t of Cou hDB is that it has a lower query expressiveness power (for

example there is no SQL join equivalent) than other analogous NoSQL databases

like MongoDB [50.

Gun.js Gun.js

5

is a full sta k CS-NoSQL implemented in javas ript. It does not

use any already implemented software, but everything is ad ho implemented. For

that reason, for some features and for the fa t that is implemented in javas ript it is

useful to test it.

An important hara teristi is that there are no entralized stru tures [36, there is

no entral server required, any lient an be a server, so it is peer to peer ( ommonly

know as P2P) [131.

A ording to the hara teristi s previously des ribed, using Gun.js, we have:

Data granularity for subs riptions: JSON full path [35.

Lo al database: yes, sin e any lient an a t as server as said previously.

CAP: AP (availability and partitioning) [34.

MapRedu e: none.

Repli ation model: multi master, sin e it is fully distributed as said previously.

Proto ol used for noti ations: webso ket [38.

Proto ol used to send data: webso ket [38. This hoi e sin e it is fully dis-

tributed as said previously.

5

http://gun.js.org

28

http://gun.js.org

3.2. Analysis

Constraints and permissions support: authenti ation support in a P2P envi-

ronment, using asymmetri ryptography [39.

Query support: none.

Other interesting features are: graph support [37, the ability to expli itly link do u-

ments together, of ourse in any NoSQL database a link an be done manually using

a sort of ID and reliable storage, it allows storing data on AWS S3 [40, AWS S3 is

a reliable loud storage [4 implemented by Amazon.

MemSQL MemSQL

6

is s alable and repli ated in memory SQL database (it is

like traditional databases). It is very interesting but, sin e it is an hybrid system

that does not use the potentiality of NoSQL, it is not analyzed in this thesis.

Meteor Meteor

7

is a Javas ript Client and Server Framework that uses MongDB

[50 to reate a CS-NoSQL appli ation.

But it is not fully opened to other te hnologies and the server side an be fully

developed. The last issue an be a problem for the s alability and e ien y, in fa t

it annot be used as serverless but instead like a lassi al appli ation, like a lassi al

approa h.

For this reason it is not analyzed in this thesis.

3.2.2 Sowftare as a Servi e

These kinds of systems an be easily adapted to a SaaS servi e, sin e they an

have no ode on the ba kend, even if a more orre t lassi ation should be serverless.

In fa t all the ommer ial systems advertise the ability to build the appli ation

without thinking to the stru ture.

Firebase Firebase

8

is one of the most famous ommer ial in loud lo ated CS-

NoSQL, it is owned by Google. It gives just a JSON do ument where you an

subs ribe to any level.

In order to oer s alability in an e ient way it does not allow writing any line

of ode on the ba kend, you an put only stati resour es on the ba kend (that of

ourse are not CPU time expensive). On the other hand it allows dening ba kend

rules that a t like triggers to validate data and have some useful additional features

like login system.

Of ourse it provides libraries and proto ols to a ess it e iently and in a realtime

way su h as: RESTful interfa e, webso ket and Event Stream.

6

http://www.memsql. om/produ t/

7

https://www.meteor. om/

8

https://firebase.google. om/

29

http://www.memsql.com/product/

https://www.meteor.com/

https://firebase.google.com/


Furthermore the ost system an be easily adapted to data used, in fa t, sin e the

infrastru ture is losed it an be very e ient and the queries are not omplex [25.

Moreover, sin e it is one of the most used databases of this type, it an be easily

integrated with other external systems.

Pubnub Pubnub

9

is a loud publish/subs ribe topi -based system with storage

support. So it does not have the power of CS-NoSQL servers sin e the hannels are

not related to data.

Of ourse there are opensour e publish/subs ribe implementations like so ket.io [79.

Pubnub has the advantage to be in loud lo ated, so it resolves also s alability issues.

Ba kand Ba kand

10

is a proprietary serverless ar hite ture for web appli ations,

it is a publish/subs ribe servi e like Pubnub. But it allows also having more ontrol

on the ba kend, like rebase.

9

https://www.pubnub. om/

10

https://www.ba kand. om/

30

https://www.pubnub.com/

https://www.backand.com/

Chapter 4

A Proposed Comparing

Traditional Approa h

In this hapter we propose an approa h based on a traditional database (RDBMS

database), we use this approa h to ompare the performan e of CS-NoSQL in 7. Of

ourse, ex ept for the database, we try to emulate the approa hes of CS-NoSQL.

The ar hite ture proposed ould be implemented with any RDBMS that has an

advan ed trigger support. In fa t even if in some ases we use a proprietary solution,

an alternative (general) solution is provided.

Firstly we show and explain all the base te hnologies to explain the ar hite ture

in 4.1: PostgreSQL, redis and so ket.io.

Then we explain the ar hite ture proposed that is general in 4.2, it is omposed

by dierent parts: database, publish/subs ribe, webso ket server, input server and

ustom logi . We explain how to implement ea h part of the ar hite ture using the

te hnologies explained before.

Finally we show our implementation in 4.3, explain better some logi parts that

depend on the implementation not on the theoreti al on epts: server logi and

lient logi .

4.1 Te hnologies Ba kground

4.1.1 PostgreSQL

The RDBS are CA a ording to the CAP theorem and as shown in gure 1.5,

i.e. they do not support a partition of data in an e ient way. But of ourse they

an be partitioned.

Moreover they an be repli ated a ording to the stru tures explained in 1.2.3: multi

master or master-slaves.

31

4. A Proposed Comparing Traditional Approa h

PostgreSQL

1

is a powerful and used database (so a lot of libraries are implemented).

It an be partitioned [58 and repli ated in a multi-master ar hite ture (and so also

master-slaves) [60.

Moreover it has interesting hara teristi s that we use in the next se tions: it

is a very exible database that allows using ustom languages [59, one of them is

PL/sh

2

a language that allows exe uting shell ommands, so we are able to all an

external program and it has a feature to reate easily a sort of publish/subs ribe

system, it is a queue where you an publish messages and read them in order, to do

that NOTIFY [57 and LISTEN [56 ommands are used. NOTIFY is really alled

only after the transa tion is ommitted. Furthermore PostgreSQL has JSON as data

type [55, it ould be useful but we do not use it to keep the approa h standard.

4.1.2 Redis

Redis

3

is a key-value storage in memory [113, data an be written to the disk

every xed amount of time. It allows implementing a lot of dierent elements: a he

system [70, publish/subs ribe system [75, queue system using lists [71, 76 and

distributed lo k system that an be used by external systems [72.

And has interesting features su h as data partitioning support [73 or master-slaves

repli ation support [77. Moreover everything (image, text, json and so on) an be

inserted as the value (key-value storage), with a high storage limit (512 megabyte

per value) [71.

Unfortunately to keep a high performan e data are not stored immediately in the

disk, but they are written every one se ond (default onguration) [74.

4.1.3 So ket.io

So ket.io

4

is one of the most famous webso ket servers, written in javas ript.

It is built on top of engine.io

5

that is like the transportation level in the ISO/OSI

sta k, it is very e ient but it is only a webso ket implementation. As we show

so ket.io has a lot of integrations and a native implementation (sin e the proto ol

is open [84) that makes it the best hoi e for our tests. Furthermore it is the best

open sour e webso ket server [88.

So ket.io has useful features for our appli ation su h as redis integration to reate

webso ket luster [85, user/session support [86, P2P support [83 that ould be

useful to implement a system like gun.js. Moreover messages are sent in FIFO order,

it is an important property to have eventual onsisten y as we show in the next

1

https://www.postgresql.org/

2

https://github. om/petere/plsh

3

http://redis.io/

4

http://so ket.io/

5

https://github. om/so ketio/engine.io

32

https://www.postgresql.org/

https://github.com/petere/plsh

http://redis.io/

http://socket.io/

https://github.com/socketio/engine.io

4.2. Ar hite ture Proposed

se tions.

Furthermore So ket.io allows emiting events to all the subs ribed lients to that

event [81, so events are hannels of a publish/subs ribe system, also the lient an

send events that, generally, are aught only by the server, of ourse, sin e it is

publish/subs ribe, there is no noti ation of su essfully delivery to the server (a k).

So the approa h to be used to work with it is the event driven approa h. So we an

a have a distributed publish/subs ribe with user/session support.

4.2 Ar hite ture Proposed

Figure 4.1: Traditional Ar hite ture Proposed

In gure 4.1 we an observe the ar hite ture proposed, every part is explained

in the following se tions.

We have also added an additional level: load balan er/CDN ( ontent delivery net-

work). Load balan er is used to s ale [97, 115, to route users to dierent so ket.io

servers.

In fa t as we observe in the ar hite ture that we propose we an have more than one

so ket.io server, but there is no system to route users.

Sin e we do not need it in the tests phase, we skip this level to avoid inserting other

levels to test.

As we see in 6 we do not partition postgreSQL and redis, in fa t there are not the

bottlene k and we want to test only the realtime feature. In that way we keep things

more simple, sin e we do not have also to onsider the issues related to partitioning

su h as laten y.

While a CS-NoSQL oers a full sta k solution (often with lient lo al database

support) here we have to dene every aspe t of the sta k. So we need to implement

what we have dened in 1.2.1, lient best pra ti es, related needed aspe ts and what

an be useful to emulate CS-NoSQL approa h.

We implement a RDBMS database that must be s alable among nodes and trig-

gers support to notify hange, publish/subs ribe system to publish noti ations

33


of hanges, webso ket server, input server where re eiving data to be sent to the

database, ustom logi level an eventual level where to introdu e ustom logi and

lient framework/libraries to ommuni ate to the server ( luster). In this solution

we have a data granularity at a row level, sin e detailed granularity is not required

by the further study that we do.

4.2.1 Database

We use postgreSQL So the normal operations are guaranteed by it (it has good

libraries), so the onsisten y is guaranteed by it.

So the only thing we have to solve is the noti ations of hanges to an external

system (a publish/subs ribe system).

The trivial solution (and general) is a trigger that alls an external program (that

publishes data on publish/subs ribe system), this an be done easily using PL/sh

as shown in appendix A.1. But the all to an external program is not so e ient

(startup time), moreover we should nd a way to all the external program only after

the ommit.

So we an use another solution that is more e ient but not standard: PUB-

LISH/NOTIFY. The trigger publishes the message, then an external listener re-

publishes them in the other publish/subs ribe system.

We have the following steps when there is a hange: trigger all publishes on

a predened postgreSQL hannel for ea h event then a javas ript listener listens

to the same postgreSQL hannel nally the javas ript listener republishes the same

message on publish/subs ribe system setting also namespa e and rooms. There are

a lot of examples useful for our use ase, one of them (with javas ript listener) [62

was modied and used in our nal onguration published on github, we an observe

that this approa h is like a key-value approa h.

4.2.2 Publish/Subs ribe and Webso ket server

We try to emulate the approa h used in CS-NoSQL.

We use so ket.io for webso ket server. As previously said so ket.io is a system built

on top of the webso ket system (engine.io), so the publish/subs ribe broker server

and the webso ket server are in the same ma hine.

It implements a distributed (via redis) publish/subs ribe, i.e. dierent publish/sub-

s ribe broker servers ommuni ate through redis. But in this way we do not know

the status of the entire network, we do not know whi h brokers are onne ted.

When a message is published on a broker server this repli ates it on redis, then other

broker servers read it and send it to the subs ribed lients. So the listener s ript

previously des ribed when publishes something simply writes it on redis (without

alling any broker server) [82.

34

4.2. Ar hite ture Proposed

Everything is done automati ally without ustom ongurations on broker server:

every event ( hannel) used by the listener an be subs ribed by the users.

So ket.io allows managing hannels at high level, we an use: namespa es [80,

we an use them to distinguish dierent tables, so we have the same events identied

unequivo ally for dierent tables and rooms [80, we an use them to reate a sort of

sub hannel, a lient ould be subs ribed only to one room that ontains only events

related only to some rows. Rooms an be very useful to set up permissions at a row

level, of ourse a lient an write and read only rooms where he is subs ribed and

only the server an subs ribe it to rooms, so to do that a ustom logi is needed.

4.2.3 Input server

A good approa h is to use RESTful to ommuni ations that do not need to be

realtime events (they are syn hronous requests).

But for simpli ity we an send also the data form lient to servers using webso ket.

Using so ket.io also the lient uses events approa h that does not provide a su essful

delivery noti ation system, of ourse we an develop our system but for the tests

that we have to do this (a k of a tions done by the lient) is not important.

So for every so ket.io event the server alls a tions on postgreSQL server, we insert

this logi into the ustom logi .

The important thing of this approa h is that the writes do not depend on redis (that

has persisten e problems) and the onsisten y of data is managed by postgreSQL.

4.2.4 Custom logi

With so ket.io we an insert a ustom logi for events sent by the lient, a simple

example is shown in appendix A.2. But we annot modify events generated by others,

in that ase the so ket.io server a ts only like a swit h of messages, i.e. in a so ket.io

node we annot modify messages sent by postgreSQL listener.

The two main operations to do in this level are: send data to database, we an dene

dierent events for dierent database operations and for ea h of them mapping the

event with the database library method and manage subs riptions to rooms, this

depends on the role of rooms. But if we use them for authorizations a simple solution

is at hing an event (sent by lient) auth where we subs ribe that lient only to some

rows based on the authenti ation result. For example we an subs ribe (join to a

room) the lient only to rows that he owns (i.e. owner_id eld mat hing).

The only problem with this solution is that there is no persisten e of data as-

so iated to a lient. If the so ket.io server dies, if the lient hanges the server (if

we have load balan e organization) or if the onne tion is interrupted (and then

reestablished) data asso iated to a lient are lost.

The only riti al thing in our approa h is the authenti ation, in fa t if we are able

35


to identify the lient we an store other elements (like rooms joined) in other pla es.

A simple and standard approa h, that is not entralized (so it does not need per-

sisten e), to authenti ate lients is JWT (Json Web Token) [116. Eventually a

resyn hronization of data is needed (for messages lost during the down) but even if

we do not implement it the solution remains eventual onsisten y.

Of ourse for the topi of the thesis it is not needed to develop this, in fa t we need

just to test performan es as shown in 6.

4.2.5 Client library and Lo al database

We have to reate a lo al database that is the nal interfa e of our lient. So the

lient onta ts only it, the lo al database sends data through webso ket and re eives

noti ations of hanges through webso ket as we said previously.

We use Last Writer Win to implement lo al database, so the repli ation should be

eventual onsisten y.

So a lo al database implementation is an obje t that ontains data and provides the

read/write methods for the user.

We have the following situations: user alls write/update, lo al data are updated,

at the same time the lo al database tries to update the remote servers until there

are no network errors and data are updated by the server, a allba k is alled and it

updates lo al data.

Often data are modied by the server: default values, data updated by triggers,

onstraints, id auto generated and so on. Sin e we write data on the lient database

we have to implement all of them in the lient, but to do that we have to reate a

sort of SQL interpreter on the lient. Of ourse if there are no network problems the

issue ould be solved waiting for the answer of the server, but this means to make

the appli ation not optimisti .

This issue is solved by other systems in the following ways (some of them are already

implemented sin e they are NoSQL): simplify onstraints (the lient an repli ate

them easily), simplify defaults (the lient an repli ate them easily), no triggers,

unique ID that an be generated by the lient. For our appli ation and tests we an

skip everything ex ept for ID, to solve it we an use a UUID (Universally Unique

Identier) [118 that an be easily generated by the lient, so ket.io server and post-

greSQL server.

4.3 Implementation

We tried to use E mas ript where it is possible, using event driven approa h. On

github

6

there are ode, ongurations and vagrant installer s ripts. The installation

6

https://github. om/ arduz/master-thesis-sour e/tree/master/proposed_solution

36

https://github.com/carduz/master-thesis-source/tree/master/proposed_solution

4.3. Implementation

instru tions an be found in the readme. We used also promise pattern [66 to make

allba ks more readable.

We analyze qui kly the stru ture of the ode, of ourse everything is ommented

so we explain only the riti al points.

Database, sample table, trigger with related fun tions, these are very simple and

follow the stru ture. As we said previously the database is neither distributed nor

partitioned.

Database listener is very simple and follows the stru ture previously explained. We

insert it in a separated se tion be ause (as we have done in the vagrant installer

s ripts) it an be deployed on another ma hine (only one ma hine, in fa t it annot

be distributed), of ourse it an be deployed in the same ma hine of the database.

This listener sends an event to lients of a spe if room (we analyze logi of rooms

below) for every a tion done in the table (insert, update, delete) as so ket.io event

(the event name is the SQL a tion). As we said previously the table is spe ied as

namespa e.

Redis server does not require additional odes, so only vagrant installer s ript is

provided.

Webso ket server is explained below sin e it is very omplex. It an be distributed

using a load balan e, but only the installer s ript of a single ma hine is provided (no

load balan er s ript) sin e we do not want to test load balan er (that means another

level to test).

Finally we qui kly observe below what the lient does.

4.3.1 Webso ket server

The behavior is trivial, sin e mu h work is done automati ally as we said previ-

ously, we re ap briey what we have done.

We dene the same a tions for all tables (spe ied in namespa e via of).

We dene auth event allba k that identies the lient.

We dene join event allba k that allows to a lient to subs ribe to rooms enabled

for him. In this ase we have provided a simple permission system where the room

number is the owner_id of the row.When a lient joins a room all rows of that room

are sent to the lient (initial retrieve): emulating the insert event reated by the

database listener after having done a read all for that room. Of ourse the number

of rows an be limited.

We map lient data events (add, put, delete) to SQL fun tions (insert, update, delete)

making some he ks. If there is an error (su h as no permission) we dis ard the om-

mand and send the right ommand only to the lient that has sent the request to

update the lo al database of the lient. For example if a lient tries to add a row

using an owner_id that is not enabled for him the server dis ards the add and sends

a delete event to that lient.

37


Using this approa h, as said previously, the onsisten y of data modi ation is

managed by postgreSQL and it is like any normal modern HTTP appli ation, in fa t

data do not pass through redis in this phase.

The normal read is managed automati ally and it is eventual onsisten y sin e the

repli ation to lo al database is not syn hronized but so ket.io guarantees FIFO de-

livery.There is only one riti al point: what if there is a hange during the initial

retrieve onsidering that the lient is not able to distinguish data (initial retrieve or

other events)? We ould have dierent situations.

A notify event is alled before the initial retrieve is exe uted. Updates and deletes

are dis arded sin e there is no data, insertions are added to lo al database but they

annot be newer than the data sent from the initial retrieve (for the same reason

of the next point). Of ourse update/delete for rows added are onsidered. So it

remains eventual onsisten y.

A notify event is alled (and so the webso ket event) before the operation is om-

mitted and so visible to the sele t done meantime (this ause sele t data older than

notify event sent previously). This is impossible sin e the notify is really alled only

after ommit.

A notify event is alled during the sending of initial data to the lient. Sin e so ket.io

is exe uted in the same thread and delivery is FIFO, the sending of initial data is

done before the new ones.

A notify hange arrives before the answer of the sele t (but the sele t ommand was

already sent). In this ase we lose the hange, but it is still eventual onsisten y. Of

ourse improvements su h as queue of events lo ked (until initial retrieve is nished)

are good, but are not implemented to keep the simpli ity of the ode.

So if the events are red before initial retrieve, they are not a problem sin e they are

dis arded or ause eventually onsisten y. At the same time if events are red after

initial retrieve, they are not a problem sin e they are aught (sin e there is FIFO

guarantee). But events sent during initial retrieve are a problem sin e we an lose

something due to the fa t that we ould dis ard them, of ourse we keep eventually

onsisten y.

4.3.2 Client

We have two main lasses: so ket.io lient, this lass simply maps to so ket.io

ommands fun tions and allba ks that lo al database wants and general lo al database.

The general lo al database has to do the following operations: keep a lo al opy of

data, when a hange is requested it updates lo al data then sends update to the

server retrying to send it if there are network delays, when a noti ation of hange is

re eived it updates lo al data and alls a allba k that says that data were updated

and as explained previously it adds a UUID to ea h row reated. Of ourse the nal

lient intera ts only with the lo al database.

38

Chapter 5

Classes of Appli ations

In this hapter we analyze the main use ases with related real study ases,

where the CS-NoSQL are suggested and where they should perform very well. So

these are the ases for whi h the CS-NoSQL are designed. Of ourse, sin e there is no

literature, the information are taken from the vendors, vendors used are rebase

(google) and pubnub, both explained in 3.2.2.

The use ases analyzed in this hapter are tested in ben hmarks in 7, we analyze in

6 how to test these use ases and whi h parameters hange to do dierent signi ant

tests.

We analyze the main use ases, with important dieren es in their hara teristi s

(su h as average number of reads ompared to number of writes) in 5.1. We also

highlight the hara teristi s that better exploit the advantages of CS-NoSQL from

a theoreti al point of view. The use ases analyzed are: realtime hat, ollaborative

software and so ial appli ations.

Then we analyze some real study ases in 5.2, where the use ases seen previously

are applied with su ess by some important ompanies. This means that in some

ases CS-NoSQL are a good solution.

5.1 Use Cases

We have dierent elements to onsider. The most important one is the number

of reads, i.e. if #reads>>#writes, but also the stru ture of data is very important,

i.e. if JSON ould in rease performan e/readability. Of ourse we need to onsider

also aspe ts su h as subs riptions granularity, noti ations stru ture and onstraints,

modi ations, permissions needed by the server.

Now we analyze some use ases, they are the most ommon a ording to vendors.

More use ases ould be found on pubnub solutions site

1

.

We observe that the traditional omparing approa h an be adapted to every use ase,

1

https://www.pubnub. om/solutions/

39

https://www.pubnub.com/solutions/

5. Classes of Appli ations

but this adaption requires time. In fa t CS-NoSQL adapt themselves automati ally,

we show in 5.2 that this is a key of their su ess.

We need to use unique ID referen es as shown in the use ases, it is one of the few

referen es supported by most NoSQL databases (all databases we use support it).

In fa t in NoSQL databases we do not have referential integrity he ks [130 (some

databases try to implement it with some expli it he ks but it is not standard), so

also in the traditional approa h we do not use the referential integrity he k (it slows

the appli ation down).

5.1.1 Realtime hat

It is a lassi al use ase, shown as examples by dierent vendors [26,67. The hat

onsidered is room based hat, i.e. there are some rooms where there are a lot of users

subs ribed that see all the messages of that room. So we expe t #reads>>#writes.

However we need to do a lot of he ks in the server, su h as permissions to be

posted in that room, identity he k (name shown) and so on. Looking at the store

stru ture [27, in parti ular at messages of rooms store (sample shown on listing 5.1),

the usage of JSON stru ture seems very useful.

1

2 "messages":

3 "room1":[

4

5 "test": "message 1",

6 "user": "user1"

7 ,

8

9 "test": "message 2",

10 "user": "user2"

11

12 ,

13 "room2":[

14 ,

15 "room -users":

16 "room1":["user1", "user2",

17 "room2":["user1",

18 ,

19 "users":

20 "user1":

21 "roomsAllowed": ["room1", "room2"

22 ,

23 "user2":

24 "roomsAllowed": ["room1"

25

40

5.1. Use Cases

26

27

Listing 5.1: Chat data stru ture

We an observe that we need to link every message to a room and to a user. Of

ourse with JSON we an make a dire t link ( hild of) with only one element, so we

need to hoose if we want to link with room (all messages of a room) or with users

(all messages of a user). Room is hosen sin e the users subs ribe to it and want to

re eive noti ations based on it (they want messages of a room not messages of a

user), while the user is linked using ID referen e.

Of ourse using a traditional approa h we link both things (room and user) using

ID referen e. We use room ID as so ket.io room, so we manage permissions and

we automati ally send noti ations of rooms hanges (so it is the same approa h of

CS-NoSQL solution).

So we have three tables: messages, rooms and users, for every row of messages we

have a referen e to user and room. If we want to link users to rooms (permission)

we need a pivot table that links users and rooms. The JSON version seems to be

more readable.

So permissions and noti ations are managed analogously with a traditional ap-

proa h, we expe t small improvements omparing with traditional approa h given

only by the fa t that ar hite ture and delivery are optimized to have #reads>>#writes.

Moreover we only add data that we know to be unique, NoSQL databases should

be better with this kind of aspe ts. But sin e we have #reads>>#writes we should

not see ee ts of it.

We an observe that if we want to retrieve e iently the users of a room we an

do it easily with a traditional approa h (it is a query on users table), but to do it

with NoSQL (sin e we do not have queries) we have to repli ate data (room-users)

as shown in listing 5.1.

5.1.2 Collaborative software

A ollaborative software is a software that allows to more than one person to work

together in the same do ument, one of the most famous ollaborative softwares is

Google Do s [33. A simple ollaborative software is provided as example by rebase

[28. Sin e the data stru ture is not simple we keep in onsideration a simplied

version (we do not onsider aspe ts su h as multiple do uments or permissions) in

5.2.

1

2 "history":[

3

4 "timestampt":1490506829 ,

41


5 " hangeObje t":

6 "start": 20,

7 " hange": "ab d"

8 ,

9 "user":"user1"

10 ,

11

12 "timestampt":1490506830 ,

13 " hangeObje t":

14 "start": 22,

15 " hange": "Impa t",

16 "end": 24

17 ,

18 "user":"user2"

19 ,

20

21 "timestampt":1490506835 ,

22 " hangeObje t":

23 "start": 22,

24 " hange": -2

25 ,

26 "user":"user2"

27 ,

28 ,

29 "users":

30 "user1":

31 "position": 10

32 ,

33 "user2":

34 "position": 15,

35 "positionEnd": 20

36

37

38

Listing 5.2: Collaborative software data stru ture

We work using history of hanges, in fa t sin e there is no lo k users annot

hange the same element meantime.

The stru ture of history is simple: we an add a text (rst hange), we an add a

font to a portion of text (se ond hange), we an remove some hars (third hange).

Moreover we an observe that we an subs ribe to users to see their realtime ursor

position.

It is easy to observe how the JSON stru ture is perfe t for this kind of appli ations,

it is easy to think to extend this appli ation adding hildren detailed elds in hange.

42

5.1. Use Cases

Of ourse in the traditional appli ation we an do the same thing serializing some-

thing (also JSON) to a string.

We have also two subs riptions (users and history) that are easy to manage and easy

to read with JSON stru ture.

But of ourse we an manage them with a traditional approa h, in fa t they be-

ome two dierent tables. We have: history table with timestamp, user referen e,

hangeObje t (that ould be the JSON as string) elds and users with position and

positionEnd elds.

We only add data that we know to be unique, NoSQL databases should be bet-

ter with this kind of onstraints. Sin e we have #reads ≃ #writes we should see

dieren es due to the add e ien y.

5.1.3 So ial Appli ations

A so ial appli ation is a lassi al use ase, shown as example by dierent vendors

[29, 69. It is omposed by lassi al elements: users, relationships (followers), posts,

omments, likes. A data stru ture that uses the power of JSON ould be the one

shown in the listing 5.3, but the stru ture really used is dierent as shown in the

listing 5.4 [30.

1

2 "users":

3 "user1":,

4 "user2":,

5 "user3":,

6 ,

7 "relationships":[

8 ["user1", "user2",

9 ["user2", "user3",

10 ,

11 "posts":[

12

13 "user": "user1"

14 " ontent":"aaa",

15 "likes":[

16 "user1",

17 "user2",

18 ,

19 " omments":[

20

21 "timestamp": 1490506829 ,

22 "user": "user2",

23 " onent": "bbbb"

24

43


25

26

27

28

Listing 5.3: Possible so ial stru ture

1

2 "users":

3 "user1":

4 "posts": ["post1"

5 ,

6 "user2":"posts":[,

7 "user3":"posts":[,

8 ,

9 "followers":

10 "user1":["user2",

11 "user2":["user1", "user3",

12 "user3":["user2"

13 ,

14 "likes":

15 "post1":["users1", "user2"

16 ,

17 " omments":

18 "post1":[

19

20 "timestamp": 1490506829 ,

21 "user": "user2",

22 " onent": "bbbb"

23

24

25 ,

26 "posts":

27 "post1":

28 "user": "user1"

29 " ontent":"aaa"

30

31

32

Listing 5.4: So ial stru ture

Even if the rst data stru ture seems good, the se ond is better. In fa t we an

subs ribe only one time to the main obje t, while in the rst stru ture we have to

subs ribe for every new post. Lots of subs riptions ould be a problem. Moreover it

is easier to bring the se ond approa h on traditional te hnologies.

For simpli ity permissions and related aspe ts are skipped. Like the previous use

44

5.2. Examples of Real CS-NoSQL Appli ations

ase, even if we have more than one subs ription we an manage them easily (with

dierent tables). We have: users, likes (with referen e to user and to post), omments

(with referen e to user and to post), post (with referen e to user).

But we ould have some problems due to the absen e of the integrity onstraints.

For example for likes, we ould add a referen e to a post in the likes obje t while the

post is deleting itself (in fa t we do not even have lo k).

We an observe that if we want to retrieve e iently ea h user's posts we an do it

easily with a traditional approa h (it is a query on users table), but to do it with

NoSQL (sin e we do not have queries) we have to repli ate data (posts under users)

as shown in listing 5.4. We expe t #reads >> #writes, it is ommon to have a lot

of followers (so for every write we have to repli ate it to a lot of followers).

It seems that everything an be managed e iently with a traditional approa h so,

like the rst use ase, we expe t minor improvements due to e ient ar hite ture

and delivery optimized.

5.2 Examples of Real CS-NoSQL Appli ations

In this se tion we analyze qui kly some real stories of appli ation of CS-NoSQL.

For ea h ase we show the problem solution and link with the use ases shown pre-

viously.

Of ourse sin e the ases are taken from vendors, the te hnologies used are propri-

etary in loud lo ated ones so there an be some dieren es ompared to tests we an

do with opensour e te hnologies. In fa t proprietary in loud lo ated te hnologies

sometimes have extra features or extra performan es in some parti ular onditions.

5.2.1 Adobe DPS

Adobe DPS (Digital Publishing Solution) is a ollaborative software used for

publishing mobile app experien es. It is developed over pubnub [1.

Problem Sin e it is a fully- ollaborative software, there should be the possibility

to work on the same proje t together (dierent persons) from dierent devi es, from

dierent lo ations. Like Google Do s [33.

Solution After having analyzed the developing osts of ustom solutions built over

lassi al systems a ommer ial CS-NoSQL solution was hosen. It sends noti ations

about hanges of the proje ts to all the onne ted devi es. It also allows, in the

future, introdu ing new features like server-server noti ations. The system hosen

oers also global redundan y.

45


Use ase We an easily observe that this is the use ase shown in 5.1.2. Of ourse

the stru ture of data is dierent, but the idea of saving hanges is not. The exibility

of JSON gives us the possibility to use the same previous model in more omplex

situations than a simple text do ument (we only need to hange the ontent of hange

eld as explained previously).

5.2.2 Logite h Harmony Ultimate Home

Logite h harmony ultimate home is a home automation hub that allows ontrol-

ling house from the app and by other means. It is developed over pubnub [48.

Problem With the app the user an ontrol the hub from every lo ation, data

hanged with the hub are a stream of data. So a se urely and reliable solution to

send a stream of data is needed.

Solution A ommer ial CS-NoSQL solution was hosen. It is used to send realtime

data from the app to the hub, when the user is outside. Moreover the hub sends

a stream of realtime data from dierent devi es (lights, temperatures ex.) to the

mobile app.

Use ase We have not studied this use ase previously but it is marked as one of

the ommon use ases by pubnub [68.

5.2.3 CornerJob

CornerJob is a lo ation based job re ruitment app. It is developed over pubnub

[5. The main feature of it is the hat: when a job seeker applies for a job a new hat

is reated.

Problem Sin e the hat is the most important part of the appli ation, the ompany

wanted a standard hat on a reliable te hnology that has a low developing and

maintenan e ost.

Solution After having analyzed developing osts of ustom solution built over

lassi al systems and after having tried free plans, a ommer ial CS-NoSQL solution

was hosen. Be ause it allowed building a hat system in few time and in a reliable

way. In fa t the CS-NoSQL is a full sta k system that takes are of every step

needed, so there is no need to think about s alability and ommuni ations among

internal omponents. Moreover, sin e a ommer ial system proprietary in loud

lo ated was used, the ompany does not have to think about the maintenan e of the

infrastru ture.

46

5.2. Examples of Real CS-NoSQL Appli ations

Use ase We an easily observe that this is the use ase shown in 5.1.1. Of ourse

it is a little dierent, sin e we do not have hat rooms, but we have only private

messages between two users. So some aspe ts hange: it is not true that #reads >>

#writes, the version for traditional approa h is not still valid (minor hanges are

needed) and so on.

47


48

Chapter 6

Ben hmarks strategy

In this hapter we des ribe the idea behind ben hmarks, whose results are pub-

lished and ommented in hapter 7. We show how to test the lasses of appli ations

analyzed in the previous hapter, how to test s alability of dierent systems and

whi h indi es to take to ompare systems. So we show theoreti al on epts and we

design our test framework from a theoreti al point of view.

We analyze the theoreti al on epts of s alability and how to test s alability with

our systems in se tion 6.1.

We show the idea behind some test frameworks and we design our test framework

for CS-NoSQL in se tion 6.2.

Then we show how to test the lasses of appli ations analyzed in the previous hapter

in a general way in se tion 6.3.

Finally we explain how to integrate the tests of dierent lasses of appli ations to

our test framework in se tion 6.4.

6.1 S aling Test

In this se tion we show how we an test the s aling of dierent solutions. We

analyze the s aling that we do and the reasons of those hoi es. Then we show

dierent test environments based on the previous hoi es.

6.1.1 S aling

We know that NoSQL databases implement partitioning very e iently [98. So

we expe t that with a lot of data that require partitioning the performan e of NoSQL

solutions would be mu h better than the traditional approa h.

Moreover sin e, often, strong onsisten y onstraints are relaxed, also the repli ation

is more e iently.

Both aspe ts bring to have horizontal s alability (also alled out/in) [122, sin e we

in rease the number of servers (not aspe ts like the power of them).

49

6. Ben hmarks strategy

Furthermore, sin e CS-NoSQL are a full sta k solution, also the internal realtime

delivery server is s aled horizontally (it is s aled with the database).

So in order to keep simpli ity, to make standard things (partition is implemented

in dierent ways based on the data model) and to avoid adding an additional level

to test, we de ided to skip this part. So we test without horizontal s aling.

Moreover, sin e our topi is ompared to these kinds of appli ations with a standard

approa h, if we test also the horizontal s aling we add another variable fa tor that

an hange our results.

For the same reason we do not s ale the traditional approa h that we proposed

horizontally. Here the s aling we mean: partitioning and repli ation of postgreSQL

server and repli ation of so ket.io ma hines, redis is not a bottlene k so it does not

need s aling. But we do a verti ally s aling (also alled up/down) [122 that we

explain in the next se tion.

6.1.2 Environments

In this se tion we show dierent ongurations that we want to test. These

ongurations were found after some minor empiri al tests. We have also set them

to have the same maximum and minimum sum of resour es used ( ounting not stati

servers).

As said previously, CS-NoSQL are a full sta k solution implemented in just one

server. So we an reate a ommon environment for them, shown below.

While on the traditional approa h proposed we have dierent servers. So they need

an advan ed dis ussion.

So ket.io and gun.js ould be onsidered mono-thread so we do not s ale the CPU.

In fa t we have a main thread where to exe ute allba ks and other operations,

furthermore ba kground operations (exe uted in other threads) in ase of network

delivery (ba kground operations used by so ket.io) are not CPU intensive [137.

To keep things more general all the tests are done using virtual ma hines (using

virtualbox

1

) [110 on the same physi al ma hine.

So all the omponents and te hni al hara teristi s are the same (su h as RAM

speed).

Note that with CPU we mean a standard modern CPU (i7 generation), of ourse for

all servers the same CPU was used.

CS-NoSQL Sin e we have only one server we an simply follow the table 6.1. As

we said previously gun.js an be onsidered mono-thread, so we an test it with just

one CPU (so we have only two ases based on the RAM).

1

https://www.virtualbox.org

50

https://www.virtualbox.org

6.2. Test Framework

Table 6.1: CS-NoSQL test environments

N° RAM [GB N° CPU

1 2 2

2 2 3

3 4 2

4 4 3

Comparing traditional approa h Here we have 4 servers, for ea h of them we

use a virtual ma hine:

PostgreSQL: this is a riti al point and we follow table 6.2.

Redis: this is not a riti al point, it is not also a thing to test. So we an

onsider it stati , we use always a ma hine with 1CPU and 512MB RAM.

Listener: this is not a riti al point, it is not also a thing to test. So we an

onsider it stati , we use always a ma hine with 1CPU and 512MB RAM.

So ket.io: this is a riti al point and we follow table 6.3. As we said previously

gun.js an be onsidered mono-thread, so we an test it with just one CPU.

Of ourse we have to test all the ombinations: so we have 8 tests to do.

Table 6.2: postgreSQL (traditional approa h) test environments

N° RAM [GB N° CPU

1 1 1

2 1 2

3 2 1

4 2 2

Table 6.3: so ket.io (traditional approa h) test environments

N° RAM [GB N° CPU

1 1 1

2 2 1

6.2 Test Framework

Sin e the systems used are ustom or are new there are no stable frameworks to

test their performan e. So we follow the guidelines of an existing framework alled

YCSB [89, we realize our framework that tests only what we need and in the way

we want. The framework with tests prepared (and related aspe ts like SQL tables)

51


ould be found on github

2

with instru tions on the readme.

But we tested only performan es (throughput and laten y). A further analysis should

test the onsisten y of the distributed environment, to do that there are some tools

like jepsen [47 (we have shortly analyzed it in appendix B).

The framework we realized is done in javas ript, so it is easy to integrate it with

other platforms. Sin e we know whi h are the indi es that we need, we have just

to implement general aspe ts. We realize a generator of data, a ording to the data

stru ture, it is shown in se tion 6.3. And a general lient that allows sending data

and reading lo al databases (or re eive data), it is shown in se tion 6.4.

What we need to test are write and read performan es. In fa t, as we said, we have

dierent models of data, but we do not have omplex operations so we have only

basi writes and written repli ations of data.

Sin e our riti al point is the repli ation, we have to test with more than one lient.

We an onsider a write ompleted when it is repli ated in all lients.

So we need at least one lient to make writes and at least one to he k repli ations

( alled reads). The number of lients (writer and reader) depends on the type of

appli ation.

We emulate dierent situations where we have (for example) 1 writer and 100

reader lients. So an important information is the laten y of the syn hronization to

all these lients. We all exe ution the write/read data a ording to the number of

writers/readers spe ied in the model.

So we measure the dierent things. Laten y to syn hronize all lients and through-

put (request/laten y to syn hronize everything), for every exe ution. Mean (with

varian e) laten y to get data by ea h lient. Final throughput: total number of

requests (reads and writes) per se ond.

So ea h test unit is omposed by dierent lients, so we exe ute more than one unit

in parallel.

To do that we have a test manager that he ks everything and ommuni ates with

lients, in fa t every lient is implemented as another pro ess.

The reation of a pro ess and ommuni ation is managed by the framework in a

transparent way. In order to map we need only lient fun tions with the general

lient (it is done by the adapter).

The framework, sin e it is not a general framework, does not generate harts, it

only generates raw data, harts an be generated using external and powerful in-

struments like sheet softwares. Of ourse the test framework is exe uted on another

virtual ma hine (with high resour es) or in the host ma hine.

2

https://github. om/ arduz/master-thesis-sour e/tree/master/tests

52

https://github.com/carduz/master-thesis-source/tree/master/tests

6.3. Tests sets

6.3 Tests sets

In this se tion we analyze what every element of the lasses of appli ations (ana-

lyzed previously) needs. We also dene for ea h of them the number of lients to be

used for tests, for some of them we an dene more versions (e.g. test with 1 reader

lient then test 10 reader lients) to test s alability and adaptability of repli ation

and on urrent writing. Of ourse this is a simplied simulation of the behavior in

these lasses of appli ations.

All data shown (like the number of lients) are found after some minor empiri al

tests. In some ases data stru tures need some hanges to be adapted to the model

of the database used. There are minor hanges (simpli ations) that are not shown

here, of ourse the ode on github ontains everything.

We should onsider also the size of data, but after some experiments, we have seen

that some systems do not support a big amount of data. So the test would be in-

onsistent. Of ourse we analyze the data volume adaptability, with data volume we

mean the volume of data in the database (not the size of the single eld).

6.3.1 Realtime hat

Clients As we said previously, sin e it has a room based hat, we expe t #reads

>> #writes. This example is useful to test the performan es of a lot of lients

subs ribed to a subs ription. So a reasonable number of lients, onsidering the

environments previously dened, ould be the one shown in table 6.4, where we have

#readers>>#writers. In that table is shown also the number of rooms.

Table 6.4: Realtime hat lients

N° N° Writer N° Reader N° Room

1 1 10 1

2 1 100 1

3 10 100 5

Data generation After a trial data initialization (e.g. users reation or room

reation), to run tests we have just to reate messages. So our generator is simply a

fake text generator, other data like the user an be stati . As we analyze in the next

se tion, we have to manage the room where to write.

Writing We want that a writer is subs ribed only to one room.

Reading We want that ea h read is subs ribed to 2 rooms (if possible).

53


6.3.2 Collaborative software

Clients As we said previously, we expe t #reads ≃#writes. This example is useful

to test the performan es of on urrent writes. So a reasonable number of lients,

onsidering the environments previously dened, ould be the one shown in table

6.5, where we have #readers ≃ #writers.

Table 6.5: Collaborative software lients

N° N° Writer N° Reader

1 1 1

2 10 10

3 100 100

Data generation After a trial data initialization (e.g. users reation), to run tests

we have just to reate hanges elements. A hange is a JSON obje t of the stru ture

shown in listing 6.1. There are two elements: start, i.e. the start position of the

hange and hange, i.e. the new text that repla es the old one starting from the start

position. We an onsider everything stati ex ept for start and hange that should

be respe tively fake number and text.

1

2 "timestampt":1490506829 ,

3 " hangeObje t":

4 "start": 20,

5 " hange": "ab d"

6 ,

7 "user":"user1"

8

Listing 6.1: Collaborative software hange stru ture

Writing Here the writing is trivial, all lients write to history (list of hanges).

Reading Here the reading is trivial, all lients subs ribe to history (list of hanges).

6.3.3 So ial

Clients As we said previously we expe t #reads >> #writes. This example is

useful to test the performan es of a lot of lients subs ribed to multiple subs riptions.

So a reasonable number of lients, onsidering the environments previously dened,

ould be the one shown in table 6.6, where we have #readers >>#writers.

In a so ial appli ation for ea h person there are a lot of users that see the writes.

These users are followers and sometimes also followers of their dire t followers.

54

6.4. Adapters for systems

Table 6.6: So ial lients

N° N° Writer N° Reader

1 1 10

2 1 100

3 10 100

Data generation After a trial data initialization (e.g. users reation), we have

dierent elements:

Posts: we should generate fake texts ( ontents).

Comments: we should link users to posts. Of ourse sin e there are no he ks

we an generate random ids (also non exiting ones) for referen es.

Likes: we an observe that they are like omments. So, to simplify, we an skip

them.

We an say that there are more omments than posts, for simpli ity we an send a

post every 9 omments. When we generate a post we have a new post that an have

omments, and where users an subs ribe. So we have a subs ription to new posts

and multiple subs riptions for omments, ea h for every post.

Writing As said previously the reation of a post is done every 9 omments, it

is done by all writers. We want that for ea h post we have two writers that write

omments (if possible). The writers an omment to all posts, also older ones.

Reading All lients subs ribe to posts, they read all the new posts. A lient is

subs ribed to half of the posts. Even if new posts are reated the old subs riptions

are not deleted. So the more time passes, the more subs riptions are reated.

6.4 Adapters for systems

The general lient needs: login method, join method to subs ribe to hannel and

to a table/do ument, write ommand, data allba k that is alled when there are

new data (of ourse new data are passed as argument).

We have already proved that systems an provide authenti ation. But sin e it is

implemented in dierent ways it ould add another variable fa tor to onsider that

an hange the nal results, so we skip it sin e it is not our main target to test. Now

we analyze qui kly how to implement these for all the platforms.

55


Cou hBase

Subs ription: we annot manage hannels, we an hoose only to whi h do u-

ment we subs ribe. So we re eive noti ations for every hange in do uments

where we are subs ribed. We ould use lters [8 as workaround to this prob-

lem, but they are not so e ient and they are not so easy to use in a real

environment (due to permissions needed to reate lters).

Write ommand: it is a normal asyn hronous all.

Data allba k: it returns new data from the lo al database.

Pou hDB

Subs ription: we annot manage hannels, we an hoose only to whi h do u-

ment we subs ribe. So we re eive noti ations for every hange in do uments

where we are subs ribed. We ould use lters [63 as workaround to this prob-

lem, but they are not so e ient and they are not so easy to use in a real

environment (due to permissions needed to reate lters).



Gun.js

Subs ription: we an subs ribe to any level of the JSON stru ture, so we do

not have problems (only messages that we need are delivered).



Comparing traditional approa h

Subs ription: we an reate ustom hannels and join them using join, so only

messages that we need are delivered.



56

Chapter 7

Ben hmarks

In this hapter we show the ben hmark tests that we have done a ording to

what we have seen in the previous hapter, for the lasses of appli ations and for the

systems we previously dened.

We analyze the tests done, with some assumptions raised with pra ti al experi-

ments in se tion 7.1. Tests exe utions are organized by te hnology, for ea h te hnol-

ogy they are organized by lass of appli ation.

Finally we re ap the results obtained, showing why in most ases our traditional

omparative approa h is better in se tion 7.2.

7.1 Tests

Here we show data of the exe ution of the tests. We tried, when possible, to

simplify them to avoid to run some tests.

Moreover even if we measure dierent data (su h as request throughput or varian e

of laten y) explained in the previous hapter, sin e they would make the omparison

onfusing, we show only nal throughput (in tables) and average laten y in se onds

(in harts). Of ourse the tool developed generates the other data and tests an be

easily reprodu ed.

For ea h system we have exe uted the test for every lass of appli ations. For ea h

of them we have a table (with exe utions) and a hart.

In the table we reported results for dierent numbers of lients and for dierent

environments, but in the harts we reported only results for dierent lients using

the best environment.

For ea h system we run a short analysis of the results for: s alability, data

volume adaptability (what happens to the entire system if the data stored in rease),

laten y stability and write performan es (what is possible to dedu t from indire t

measurements that do).

Then we ome to some observations for ea h lass of appli ations, remember that

57

7. Ben hmarks

the main aspe ts to test for ea h of them are: hat (we test subs ription delivery

e ien y), ollaborative (we test the on urrent writes e ien y), so ial (we test

multiple subs ription e ien y, we also remind that the number of subs riptions

in reases during the time).

The tests were exe uted for 30 se onds, more se onds were ne essary to nish all the

exe utions. The environment was stable and we do not needed to exe ute the same

test more than one time (to take an average value).

For ea h onguration of number of lients, for ea h lass of appli ations and

for systems we had to nd the right value of on urren y. It is the number of

on urrent exe utions, an exe ution is the write/read data a ording to the number

of writers/readers spe ied by the model. When an exe ution is nished another is

started (to keep the same onsisten y value).

So the total number of writes/reads started together is equal to #writes/reads in

the model multiplied for on urren y.

Con urren y inuen es the number of tasks sent to the same writer, but often a

writer (depends on the implementation of the lient of the server) sends data in a

sequential way. So to send data in a on urrent way we have to emulate more writers.

This value is found in an empiri al way: the value hosen is the rst value (starting

from lower) that guarantees the max throughput in the best environment.

We do tests as a bla k box, we do not know details of the systems.

So, for example, if writers are slow we do not know if the problem is the database

itself or the realtime delivery level. But if we observe that on urrent writes have the

same time of non on urrent writes we an expe t that the bottlene k is the realtime

delivery level.

Some tests were not exe uted, sin e they have not be setup in a reasonable time

(>60se onds) or the server has died.

7.1.1 Cou hbase

General properties

S alability Cou hbase strongly depends on the CPU, but it has a low s alability.

We an observe from tests that if we in rease the number of CPU the performan es

in rease for small volumes, but do not in rease signi antly for a big dataset. The

CPU is saturated immediately, while the RAM is kept free, we have seen this also

analyzing the ma hine during the tests.

Data volume adaptability Cou hbase depends on the total volume of data.

Gradually data in rease and the laten y in reases, in fa t (during pre-tests, the

development phase) we needed to lear the database to have a reasonable speed

(thing not needed with other systems). We observe this for all lasses of appli ations

58

7.1. Tests

and for all exe utions.

Laten y stability The ou hbase laten y is stable during the entire pro ess for all

lasses of appli ations and for all exe utions. Of ourse there are some small errors

in the measurements that are due to dierent aspe ts, su h as the time to send all

initial data.

Write performan es If we have a big amount of writes (not ne essarily on ur-

rent) they are very slow, in reasing the laten y. We an easily observe this for the

ollaborative ben hmark when we have 100 writers.

At the same time it seems that the on urrent writes are not so riti al (they do not

inuen e so mu h the result ompared to a normal write). In fa t we are also able

to use a big on urren y fa tor.

Classes of appli ations

We observe the throughput obtained for the dierent ongurations and environ-

ments. For ea h lass of appli ation we have a separate table.

At the same time we observe the laten y during the entire pro ess for the best en-

vironment for ea h onguration, shown with dierent olors (the legend shows the

ouple #writers-#readers). It is shown as a hart, on the abs issa we have the rela-

tive time (in se onds) into the pro ess whereas on the ordinate we have the laten y

value (in se onds). For ea h lass of appli ation we have a separate hart.

Chat The throughput is shown in table 7.1, whereas the laten y is shown in g-

ure 7.1. Laten y is very high when we have more writes ( onguration 1 for high

on urren y and ase 3), but a big amount of readers is well managed.

Collaborative The throughput is shown in table 7.2, whereas the laten y is shown

in gure 7.2.Sin e we have more total writes the laten y is higher in the rst on-

guration, but it seems that on urrent writes are well managed if they are few. If

we have a lot of on urrent writes like in the third onguration we have very bad

performan es.

So ial The throughput is shown in table 7.3, whereas the laten y is shown in gure

7.3. When we have more readers and so a lot of subs riptions things go bad.

59

7. Ben hmarks

Table 7.1: Chat ( ou hbase) ben hmarks

N° N° Writer N° Reader Con urren y RAM [GB N° CPU throughput [req/s

1 1 10 200 2 2 310

2 1 10 200 2 3 322

3 1 10 200 4 2 311

4 1 10 200 4 3 327

5 1 100 40 2 2 440

6 1 100 40 2 3 615

7 1 100 40 4 2 443

8 1 100 40 4 3 734

9 10 100 15 2 2 39

10 10 100 15 2 3 47

11 10 100 15 4 2 40

12 10 100 15 4 3 49

Figure 7.1: Chat ( ou hbase) ben hmarks

Table 7.2: Collaborative ( ou hbase) ben hmarks


1 1 1 350 2 2 94

2 1 1 350 2 3 129

3 1 1 350 4 2 98

4 1 1 350 4 3 137

5 10 10 15 2 2 80

6 10 10 15 2 3 87

7 10 10 15 4 2 81

8 10 10 15 4 3 89

9 100 100 1 2 2 1

10 100 100 1 2 3 1

11 100 100 1 4 2 1

12 100 100 1 4 3 1

60

7.1. Tests

Figure 7.2: Collaborative ( ou hbase) ben hmarks

Table 7.3: So ial ( ou hbase) ben hmarks


1 1 10 250 2 2 620

2 1 10 250 2 3 622

3 1 10 250 4 2 620

4 1 10 250 4 3 623

5 1 100 80 2 2 401

6 1 100 80 2 3 405

7 1 100 80 4 2 402

8 1 100 80 4 3 406

9 10 100 20 2 2 49

10 10 100 20 2 3 50

11 10 100 20 4 2 50

12 10 100 20 4 3 51

Figure 7.3: So ial ( ou hbase) ben hmarks

61

7. Ben hmarks

7.1.2 Pou hdb

General properties

S alability Pou hdb depends on CPU and RAM. In fa t all tests show that if we

in rease both we obtain a signi ant in rease of performan es. But at the same time

for a big data volume we do not have in rease of performan e.

Data volume adaptability It seems that pou hdb does not depend on the volume

of data, of ourse more spe i tests should be done.

Laten y stability The laten y is not stable during the entire pro ess. We an

easily observe this in the rst exe utions for ea h lass of appli ation, in fa t in that

ase we have a high level of on urren y and a high throughput.

Write performan es They are very slow, in reasing the laten y. It seems also

that on urrent writes are a problem (we had to skip some tests).









Chat The throughput is shown in table 7.4, whereas the laten y is shown in gure

7.4. Laten y is high when we have more writers, but a big amount of readers is well

managed (laten y is not inuen ed).


in gure 7.5. Con urrent writes are not well managed, the laten y in reases and

throughput de reases.


7.6. When we have more readers and so a lot of subs riptions things go bad.

62

7.1. Tests

Table 7.4: Chat (pou hdb) ben hmarks


1 1 10 100 2 2 745

2 1 10 100 2 3 830

3 1 10 100 4 2 820

4 1 10 100 4 3 1030

5 1 100 1 2 2 70

6 1 100 1 2 3 70

7 1 100 1 4 2 88

8 1 100 1 4 3 100

9 10 100 1 2 2 15

10 10 100 1 2 3 20

11 10 100 1 4 2 16

12 10 100 1 4 3 25

Figure 7.4: Chat (pou hdb) ben hmarks

Table 7.5: Collaborative (pou hdb) ben hmarks


1 1 1 100 2 2 390

2 1 1 100 2 3 390

3 1 1 100 4 2 420

4 1 1 100 4 3 420

5 10 10 10 2 2 63

6 10 10 10 2 3 70

7 10 10 10 4 2 65

8 10 10 10 4 3 74

9 100 100 1 2 2 NONE

10 100 100 1 2 3 NONE

11 100 100 1 4 2 NONE

12 100 100 1 4 3 NONE

63

7. Ben hmarks

Figure 7.5: Collaborative (pou hdb) ben hmarks

Table 7.6: So ial (pou hdb) ben hmarks


1 1 10 100 2 2 650

2 1 10 100 2 3 664

3 1 10 100 4 2 670

4 1 10 100 4 3 725

5 1 100 1 2 2 4

6 1 100 1 2 3 4

7 1 100 1 4 2 4

8 1 100 1 4 3 5

9 10 100 1 2 2 3

10 10 100 1 2 3 3

11 10 100 1 4 2 3

12 10 100 1 4 3 3

Figure 7.6: So ial (pou hdb) ben hmarks

7.1.3 Gun.js

We remind that we use only 1 CPU for gun.js server.

64

7.1. Tests

General properties

S alability Gun.js strongly depends on the RAM. Often it dies for out of mem-

ory (after some se onds of tests, so it does not die for few se onds tests), but at the

same time we have seen that it saturated the only ore used. So the ore be omes

the bottlene k and it is not possible to s ale it. With more RAM we do not in rease

the performan es but we are able to exe ute some tests that with less RAM would

die.

Data volume adaptability It seems that Gun.js strongly depends on the volume

of data. We an observe in all tests, but parti ularly in the so ial one (where the

data in rease in time), that the laten y in reases in time with the in rease of data.

Laten y stability The laten y is stable during the entire pro ess, it is inuen ed

by data volume (but regularly) as said previously. All tests do not show signi ant

os illations.

Write performan es They are very slow, in reasing the laten y. It seems also

that on urrent writes are a problem (we had to skip some tests).










7.7. A big amount of readers is well managed (laten y is not inuen ed).


in gure 7.8. Con urrent writes are not well managed, the laten y in reases and

throughput de reases.


7.9. When we have more readers and so a lot of subs riptions things do not go so

bad, so multiple subs riptions are well managed (not perfe tly but better than in

other systems).

65

7. Ben hmarks

Table 7.7: Chat (gun.js) ben hmarks

N° N° Writer N° Reader Con urren y RAM [GB throughput [req/s

1 1 10 100 2 270

2 1 10 100 4 277

3 1 100 90 2 NONE

4 1 100 90 4 190

5 10 100 1 2 NONE

6 10 100 1 4 NONE

Figure 7.7: Chat (gun.js) ben hmarks

Table 7.8: Collaborative (gun.js) ben hmarks


1 1 1 100 2 165

2 1 1 100 4 168

3 10 10 10 2 20

4 10 10 10 4 20

5 100 100 1 2 NONE

6 100 100 1 4 NONE

66

7.1. Tests

Figure 7.8: Collaborative (gun.js) ben hmarks

Table 7.9: So ial (gun.js) ben hmarks


1 1 10 160 2 330

2 1 10 160 4 333

3 1 100 15 2 NONE

4 1 100 15 4 114

5 10 100 1 2 NONE

6 10 100 1 4 NONE

Figure 7.9: So ial (gun.js) ben hmarks

7.1.4 Traditional

We remind that we use only 1 CPU for so ket.io server. Moreover, after some

tests, we have seen that postgreSQL is not the bottlene k, so we an avoid testing

dierent environments for it.

67

7. Ben hmarks

General properties

S alability It does not depend on the RAM. Analyzing the ma hine we have seen

that it saturated the only ore used, so the ore be omes the bottlene k and it is not

possible to s ale it. We an observe that there is no dieren e with dierent RAM

ongurations, this sin e the limit imposed by the CPU is rea hed before, so RAM

issues should be ome visible only with few RAM.

Data volume adaptability It seems that it does not depend on the volume of

data, of ourse more spe i tests should be done.

Laten y stability The laten y is not so stable during the entire pro ess. In fa t

for all lasses of appli ations and for all exe utions we have big os illations. At the

same time we should onsider that we have also a big on urren y level so there are

a lot of tasks in exe utions, so it is statisti ally easier that there are some ollisions

among them (they try to write in the same time so they are queued).

Write performan es They are very qui k. It also seems that on urrent writes

are not a problem, the performan es remain the same in all tests.










7.10. A big amount of readers is well managed (laten y is not inuen ed). At the

same time the laten y is not inuen ed by the number of writers.

Collaborative The throughput is shown in table 7.11, whereas the laten y is

shown in gure 7.11. It seems that on urrent writes are managed not so bad.

The in rease of the laten y is not so big ( ompared with other systems) and the

de rease of the throughput is not so big.

So ial The throughput is shown in table 7.12, whereas the laten y is shown in

gure 7.12. When we have more readers and so a lot of subs riptions things go well,

so multiple subs riptions are well managed.

68

7.1. Tests

Table 7.10: Chat (traditional) ben hmarks

N° N° Writer N° Reader Con urren y RAM [GB so ket.io throughput [req/s

1 1 10 150 2 425

2 1 10 150 4 409

3 1 100 15 2 2345

4 1 100 15 4 1982

5 10 100 3 2 630

6 10 100 3 4 635

Figure 7.10: Chat (traditional) ben hmarks

Table 7.11: Collaborative (traditional) ben hmarks


1 1 1 70 2 62

2 1 1 70 4 70

3 10 10 15 2 50

4 10 10 15 4 57

5 100 100 2 2 35

6 100 100 2 4 34

Figure 7.11: Collaborative (traditional) ben hmarks

69

7. Ben hmarks

Table 7.12: So ial (traditional) ben hmarks


1 1 10 150 2 1908

2 1 10 150 4 1828

3 1 100 20 2 6314

4 1 100 20 4 5919

5 10 100 3 2 670

6 10 100 3 4 642

Figure 7.12: So ial (traditional) ben hmarks

7.2 Analysis of the Results and Lesson Learned

We an observe that there is not a real advantage in using CS-NoSQL. The

traditional solution proposed is, often, better, it has:

The highest throughput: ex ept for the rst two ongurations of ollaborative

tests.

Low laten y: ex ept for ollaborative tests.

E ient subs riptions delivery: laten y does not in rease when the number of

subs riptions in reases. As shown in hat test.

E ient multiple subs riptions delivery: even if the number of subs riptions

in reases in time during the test, the laten y remains the same. As shown in

so ial test.

Better performan es when there are a lot of on urrent writes: as shown in

ollaborative tests.

It does not depend on data volume.

Of ourse, there are some points where it is not so good like CS-NoSQL:

70

7.2. Analysis of the Results and Lesson Learned

Laten y stability.

Con urrent writes for few data: in the rst ongurations of ollaborative tests,

laten y and throughput are not so good. But if we in rease data we obtain

better results than other systems.

S alability. As we have seen it depends on the pro essor. But as we have seen in

4.2 we an use a load balan er. Moreover, sin e we used standard te hnologies,

we an use standard te hnologies [53 to implement load balan e between lo al

ores.

There is no CS-NoSQL with a number of advantages like the traditional solution

proposed.

The only advantage, ex ept for the easy onguration sin e they are a full sta k envi-

ronment, seems to be the performan e when we have the few numbers of on urrent

writes, so when we have #reads ≃ #writes. It is the main expe tation that we had,

analyzing lasses of appli ations in 5.

So we proved that, ex ept for ollaborative tests, we have better results with a tra-

ditional approa h. Moreover the traditional approa h proposed seems more stable.

Of ourse NoSQL are designed to partition themselves better, but this is useless if

the basi performan es are not so good.

71

7. Ben hmarks

72

Chapter 8

Con lusions

Analyzing tests of performan es of lasses of appli ations typi al of CS-NoSQL

we dis overed important results: CS-NoSQL are not so e ient as we expe ted sin e

they are a dedi ated system based on NoSQL databases that as we know for a

lot of situations are very e ient [98. Often, the traditional omparing approa h

we proposed is more e ient, in same ases even 10x faster. So, with the urrent

te hnologies, we would suggest a ustom solution based on traditional te hnology,

even for realtime appli ations.

Moreover, the results of our tests highlighted that CS-NoSQL are not so stable in

time and not so s alable from ben hmark tests. Often they are poorly do umented,

in fa t sometimes we had some problems to nd out the theoreti al information (su h

as information about the CAP theorem) and we had some problems to use some not

so ommon features. This is be ause these systems are not so used.

They do not have a standard (there is not literature about them), this means that

there are lots of problems if we want to use them ommer ially. There are no standard

ways for tests, additional development skills should be required and the support to

this system ould be dropped anytime. So, if we onsider these elements (that are a

ost) and the not so impressive performan es, it seems that this kind of systems are

only a higher ost for ompanies, of ourse more dedi ated studies should be done

to derive more nal results.

In fa t, proprietary in loud lo ated solutions ould be very good also for the ost

model. Espe ially if alternative solutions are also based on loud. As we said, loud

solutions have a lot of advantages like an easy onguration, full sta k environment,

zero ode required. But sin e they are not opened and they are not based on standard

te hnologies (like other loud solutions), if the support to them is dropped the entire

appli ation is lost. Of ourse, a study of this topi should be done, but is not so easy

sin e as said previously we an manage only few elements.

Future work in ludes testing onsisten y (way explained in appendix B) and

partitioning (NoSQL databases should be very e ient with partitioning).

73

8. Con lusions

Another important future work is doing better tests that go inside the infrastru ture

of CS-NoSQL so we are able to measure the single time of every a tion: time required

to update the lo al database, to deliver realtime noti ations, time to pro ess a write,

time to publish new data and so on.

But sin e NoSQL is an important topi in re ent years, sin e the target appli ations of

these systems is something that is in reasing in re ent years and sin e big ompanies

(like Google) are investing on it, we expe t an improvement in the future.

74

Bibliography

[1 Adobe DPS ase study. https://www.pubnub. om/ ustomers/adobe/.

[2 AWS Lambda. https://aws.amazon. om/lambda/.

[3 AWS Lambda pri ing. https://aws.amazon. om/lambda/pri ing/.

[4 AWS S3. https://aws.amazon. om/s3/.

[5 CornerJob ase study. https://www.pubnub. om/ ustomers/

pubnub-the-perfe t- hat-solution-for- ornerjob/.

[6 Cou hbase CAP theorem. http://developer. ou hbase. om/

do umentation/server/ urrent/ on epts/data-management.html.

[7 Cou hbase luster. http://developer. ou hbase. om/do umentation/

server/ urrent/ lustersetup/manage- luster-intro.html.

[8 Cou hbase lters. https://developer. ou hbase. om/do umentation/

mobile/1.4/guides/syn -gateway/server-integration/index.html?

language=ios.

[9 Cou hbase full text sear h. http://blog. ou hbase. om/2016/february/

ou hbase-4.5-developer-preview- ou hbase-fts.

[10 Cou hbase MapRedu e. http://developer. ou hbase. om/do umentation/

server/ urrent/ar hite ture/in remental-map-redu e-views.html.

[11 Cou hbase N1QL. http://www. ou hbase. om/n1ql.

[12 Cou hBase repli ation. http://do s. ou hbase. om/admin/admin/Tasks/

tasks-manage-repli ation.html.

[13 Cou hbase users. http://developer. ou hbase. om/do umentation/

mobile/ urrent/develop/guides/syn -gateway/authorizing-users/

index.html.

[14 Cou hbase webso ket. https://github. om/ ou hbase/syn _gateway/wiki/

WebSo ket-Based-Changes-Feed.

75

https://www.pubnub.com/customers/adobe/

https://aws.amazon.com/lambda/

https://aws.amazon.com/lambda/pricing/

https://aws.amazon.com/s3/

https://www.pubnub.com/customers/pubnub-the-perfect-chat-solution-for-cornerjob/

https://www.pubnub.com/customers/pubnub-the-perfect-chat-solution-for-cornerjob/

http://developer.couchbase.com/documentation/server/current/concepts/data-management.html

http://developer.couchbase.com/documentation/server/current/concepts/data-management.html

http://developer.couchbase.com/documentation/server/current/clustersetup/manage-cluster-intro.html

http://developer.couchbase.com/documentation/server/current/clustersetup/manage-cluster-intro.html

https://developer.couchbase.com/documentation/mobile/1.4/guides/sync-gateway/server-integration/index.html?language=ios



http://blog.couchbase.com/2016/february/couchbase-4.5-developer-preview-couchbase-fts

http://blog.couchbase.com/2016/february/couchbase-4.5-developer-preview-couchbase-fts

http://developer.couchbase.com/documentation/server/current/architecture/incremental-map-reduce-views.html

http://developer.couchbase.com/documentation/server/current/architecture/incremental-map-reduce-views.html

http://www.couchbase.com/n1ql

http://docs.couchbase.com/admin/admin/Tasks/tasks-manage-replication.html

http://docs.couchbase.com/admin/admin/Tasks/tasks-manage-replication.html

http://developer.couchbase.com/documentation/mobile/current/develop/guides/sync-gateway/authorizing-users/index.html



https://github.com/couchbase/sync_gateway/wiki/WebSocket-Based-Changes-Feed

https://github.com/couchbase/sync_gateway/wiki/WebSocket-Based-Changes-Feed

Bibliography

[15 Cou hDB hanges noti ations. http://guide. ou hdb.org/draft/

notifi ations.html.

[16 Cou hDB Clustering (partitioning). http://guide. ou hdb.org/draft/

lustering.html.

[17 Cou hDB eventually onsisten y. http://do s. ou hdb.org/en/2.0.0/

intro/ onsisten y.html.

[18 Cou hDB MapRedu e. https://wiki.apa he.org/ ou hdb/Introdu tion_

to_Cou hDB_views.

[19 Cou hDB repli ation. http://do s. ou hdb.org/en/2.0.0/repli ation/.

[20 Cou hDB RESTful. http://do s. ou hdb.org/en/2.0.0/api/.

[21 Cou hDB S aling introdu tion. http://guide. ou hdb.org/draft/s aling.

html.

[22 Cou hDB se urity. http://guide. ou hdb.org/draft/se urity.html.

[23 Cou hDB views. http://guide. ou hdb.org/draft/views.html.

[24 Database triangle. http://blog.s ottlogi . om/dgorst/assets/

mongodb-vs- ou hdb/nosql-triangle.png.

[25 Firebase pri ing. https://firebase.google. om/pri ing/.

[26 Fire hat. https://github. om/firebase/fire hat.

[27 Fire hat data Stru ture. https://github. om/firebase/fire hat/blob/

master/rules.json.

[28 Firepad. https://github. om/firebase/firepad.

[29 Friendly Pix. https://github. om/firebase/friendlypix.

[30 Friendly Pix data stru ture. https://github. om/firebase/friendlypix/

blob/master/web/database-rules.json.

[31 Google apps for work. https://gsuite.google. om/index.html.

[32 Google apps for work pri ing. https://gsuite.google. om/pri ing.html.

[33 Google do s. https://do s.google. om/do ument.

[34 Gun.js CAP theorem. https://github. om/amark/gun/wiki/CAP-Theorem.

[35 Gun.js data format. https://github. om/amark/gun/wiki/GUN%E2%80

%99s-Data-Format-%28JSON%29.

76

http://guide.couchdb.org/draft/notifications.html

http://guide.couchdb.org/draft/notifications.html

http://guide.couchdb.org/draft/clustering.html

http://guide.couchdb.org/draft/clustering.html

http://docs.couchdb.org/en/2.0.0/intro/consistency.html

http://docs.couchdb.org/en/2.0.0/intro/consistency.html

https://wiki.apache.org/couchdb/Introduction_to_CouchDB_views

https://wiki.apache.org/couchdb/Introduction_to_CouchDB_views

http://docs.couchdb.org/en/2.0.0/replication/

http://docs.couchdb.org/en/2.0.0/api/

http://guide.couchdb.org/draft/scaling.html

http://guide.couchdb.org/draft/scaling.html

http://guide.couchdb.org/draft/security.html

http://guide.couchdb.org/draft/views.html

http://blog.scottlogic.com/dgorst/assets/mongodb-vs-couchdb/nosql-triangle.png

http://blog.scottlogic.com/dgorst/assets/mongodb-vs-couchdb/nosql-triangle.png

https://firebase.google.com/pricing/

https://github.com/firebase/firechat

https://github.com/firebase/firechat/blob/master/rules.json

https://github.com/firebase/firechat/blob/master/rules.json

https://github.com/firebase/firepad

https://github.com/firebase/friendlypix

https://github.com/firebase/friendlypix/blob/master/web/database-rules.json

https://github.com/firebase/friendlypix/blob/master/web/database-rules.json

https://gsuite.google.com/index.html

https://gsuite.google.com/pricing.html

https://docs.google.com/document

https://github.com/amark/gun/wiki/CAP-Theorem

https://github.com/amark/gun/wiki/GUN%E2%80%99s-Data-Format-%28JSON%29

https://github.com/amark/gun/wiki/GUN%E2%80%99s-Data-Format-%28JSON%29

Bibliography

[36 Gun.js distributed stru ture. https://github. om/amark/gun/wiki/

Getting-Started-%28v0.3.x%29#distributed.

[37 Gun.js graphs. https://github. om/amark/gun/wiki/Graphs.

[38 Gun.js realtime. https://github. om/amark/gun/wiki/Getting-Started-

%28v0.3.x%29#real-time-syn .

[39 Gun.js se urity. https://github. om/amark/gun/wiki/Se urity

%2C-Authenti ation%2C-Authorization.

[40 Gun.js storage. https://github. om/amark/gun/wiki/

Using-Amazon-S3-for-Storage.

[41 Heroku pri ing. https://www.heroku. om/pri ing.

[42 HTTP persistent onne tions. https://www.safaribooksonline. om/

library/view/http-the-definitive/1565925092/ h04s05.html.

[43 HTTP2 Browser support. http:// aniuse. om/#feat=http2.

[44 HTTP2.0 Multiplex. https://assets.wp.nginx. om/wp- ontent/uploads/

2015/10/HTTP2.png.

[45 Jepsen postgreSQL explanation. https://aphyr. om/posts/

282-jepsen-postgres.

[46 Jepsen postgreSQL test tool. https://github. om/jepsen-io/jepsen/tree/

master/postgres-rds.

[47 Jepsen tool. https://github. om/jepsen-io/jepsen.

[48 Logite h Harmony Ultimate Home ase study. https://www.pubnub. om/

ustomers/logite h/.

[49 Master-Slave repli ation. https://en.wikipedia.org/wiki/Master/slave_

(te hnology).

[50 Mongo DB. https://www.mongodb. om/.

[51 Multi Master repli ation. https://en.wikipedia.org/wiki/Multi-master_

repli ation.

[52 Optimisti UI. http://info.meteor. om/blog/

optimisti -ui-with-meteor-laten y- ompensation.

[53 PM2. https://github. om/Unite h/pm2.

77

https://github.com/amark/gun/wiki/Getting-Started-%28v0.3.x%29#distributed

https://github.com/amark/gun/wiki/Getting-Started-%28v0.3.x%29#distributed

https://github.com/amark/gun/wiki/Graphs

https://github.com/amark/gun/wiki/Getting-Started-%28v0.3.x%29#real-time-sync

https://github.com/amark/gun/wiki/Getting-Started-%28v0.3.x%29#real-time-sync

https://github.com/amark/gun/wiki/Security%2C-Authentication%2C-Authorization

https://github.com/amark/gun/wiki/Security%2C-Authentication%2C-Authorization

https://github.com/amark/gun/wiki/Using-Amazon-S3-for-Storage

https://github.com/amark/gun/wiki/Using-Amazon-S3-for-Storage

https://www.heroku.com/pricing

https://www.safaribooksonline.com/library/view/http-the-definitive/1565925092/ch04s05.html

https://www.safaribooksonline.com/library/view/http-the-definitive/1565925092/ch04s05.html

http://caniuse.com/#feat=http2

https://assets.wp.nginx.com/wp-content/uploads/2015/10/HTTP2.png

https://assets.wp.nginx.com/wp-content/uploads/2015/10/HTTP2.png

https://aphyr.com/posts/282-jepsen-postgres

https://aphyr.com/posts/282-jepsen-postgres

https://github.com/jepsen-io/jepsen/tree/master/postgres-rds

https://github.com/jepsen-io/jepsen/tree/master/postgres-rds

https://github.com/jepsen-io/jepsen

https://www.pubnub.com/customers/logitech/

https://www.pubnub.com/customers/logitech/

https://en.wikipedia.org/wiki/Master/slave_(technology)

https://en.wikipedia.org/wiki/Master/slave_(technology)

https://www.mongodb.com/

https://en.wikipedia.org/wiki/Multi-master_replication

https://en.wikipedia.org/wiki/Multi-master_replication

http://info.meteor.com/blog/optimistic-ui-with-meteor-latency-compensation

http://info.meteor.com/blog/optimistic-ui-with-meteor-latency-compensation

https://github.com/Unitech/pm2

Bibliography

[54 PostgreSQL Fun tions. https://www.postgresql.org/do s/9.5/stati /

sql- reatefun tion.html.

[55 PostgreSQL JSON. https://www.postgresql.org/do s/9.5/stati /

datatype-json.html.

[56 PostgreSQL Listen. https://www.postgresql.org/do s/9.5/stati /

sql-listen.html.

[57 PostgreSQL Notify. https://www.postgresql.org/do s/9.5/stati /

sql-notify.html.

[58 PostgreSQL Partitioning. https://www.postgresql.org/do s/9.5/stati /

ddl-partitioning.html.

[59 PostgreSQL Pro edural Languages. https://www.postgresql.org/do s/9.5/

stati /xplang.html.

[60 PostgreSQL Repli ation. https://www.postgresql.org/do s/9.5/stati /

different-repli ation-solutions.html.

[61 PostgreSQL Triggers. https://www.postgresql.org/do s/9.5/stati /

sql- reatetrigger.html.

[62 Postgres's publish-subs ribe features made better with JSON. https://blog.

andyet. om/2015/04/06/postgres-pubsub-with-json/.

[63 Pou hDB lter. https://pou hdb. om/2015/04/05/

filtered-repli ation.html.

[64 Pou hDB live repli ation. https://pou hdb. om/guides/repli ation.

html#live\dis retionary-repli ation.

[65 Pou hDB lo al database. https://pou hdb. om/guides/databases.html.

[66 Promise pattern. https://developer.mozilla.org/en/do s/Web/

JavaS ript/Referen e/Global_Obje ts/Promise.

[67 PubNub hat. https://www.pubnub. om/solutions/ hat/.

[68 PubNub home automation. https://www.pubnub. om/solutions/

home-automation-and-ma hine-signaling/.

[69 PubNub live blogging. https://www.pubnub. om/solutions/

live-blogging/.

[70 Redis a he. http://redis.io/topi s/lru- a he.

78

https://www.postgresql.org/docs/9.5/static/sql-createfunction.html

https://www.postgresql.org/docs/9.5/static/sql-createfunction.html

https://www.postgresql.org/docs/9.5/static/datatype-json.html

https://www.postgresql.org/docs/9.5/static/datatype-json.html

https://www.postgresql.org/docs/9.5/static/sql-listen.html

https://www.postgresql.org/docs/9.5/static/sql-listen.html

https://www.postgresql.org/docs/9.5/static/sql-notify.html

https://www.postgresql.org/docs/9.5/static/sql-notify.html

https://www.postgresql.org/docs/9.5/static/ddl-partitioning.html

https://www.postgresql.org/docs/9.5/static/ddl-partitioning.html

https://www.postgresql.org/docs/9.5/static/xplang.html

https://www.postgresql.org/docs/9.5/static/xplang.html

https://www.postgresql.org/docs/9.5/static/different-replication-solutions.html

https://www.postgresql.org/docs/9.5/static/different-replication-solutions.html

https://www.postgresql.org/docs/9.5/static/sql-createtrigger.html

https://www.postgresql.org/docs/9.5/static/sql-createtrigger.html

https://blog.andyet.com/2015/04/06/postgres-pubsub-with-json/

https://blog.andyet.com/2015/04/06/postgres-pubsub-with-json/

https://pouchdb.com/2015/04/05/filtered-replication.html

https://pouchdb.com/2015/04/05/filtered-replication.html

https://pouchdb.com/guides/replication.html#livediscretionary -replication

https://pouchdb.com/guides/replication.html#livediscretionary -replication

https://pouchdb.com/guides/databases.html

https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Promise

https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Promise

https://www.pubnub.com/solutions/chat/

https://www.pubnub.com/solutions/home-automation-and-machine-signaling/

https://www.pubnub.com/solutions/home-automation-and-machine-signaling/

https://www.pubnub.com/solutions/live-blogging/

https://www.pubnub.com/solutions/live-blogging/

http://redis.io/topics/lru-cache

Bibliography

[71 Redis data types. http://redis.io/topi s/data-types.

[72 Redis distributed lo k. http://redis.io/topi s/distlo k.

[73 Redis partitioning. http://redis.io/topi s/partitioning.

[74 Redis persistan e. https://redis.io/topi s/persisten e#

aof-advantages.

[75 Redis Publish/Subs ribe. http://redis.io/topi s/pubsub.

[76 Redis queue. http://redis.io/ ommands/rpoplpush#

pattern-reliable-queue.

[77 Redis repli ation. http://redis.io/topi s/repli ation.

[78 Salesfor e. om pri ing. http://www.salesfor e. om/eu/platform/

pri ing/.

[79 So ket.io. http://so ket.io/.

[80 so ket.io hannels. http://so ket.io/do s/rooms-and-namespa es/.

[81 so ket.io emit event. https://so ket.io/do s/server-api/#

namespa e-emit-eventname-args.

[82 so ket.io-emitter. https://github. om/so ketio/so ket.io-emitter.

[83 so ket.io P2P. https://github. om/so ketio/so ket.io-p2p.

[84 so ket.io proto ol. https://github. om/so ketio/so ket.io-proto ol.

[85 so ket.io redis. https://github. om/so ketio/so ket.io-redis.

[86 so ket.io users. https://www.npmjs. om/pa kage/so ket.io.users.

[87 Webso ket. http://www.ibm. om/developerworks/library/

wa-reverseajax2/fig01.gif.

[88 Webso ket servers omparison. https://medium. om/denizozger/

finding-the-right-node-js-webso ket-implementation-b63bf a0539#.

isyo3prtn.

[89 YCSB. https://github. om/brianfrank ooper/YCSB.

[90 Daniel Abadi. Consisten y tradeos in modern distributed database system

design: Cap is only part of the story. Computer, 45(2):3742, 2012. http://

dx.doi.org/10.1109/MC.2012.33.

79

http://redis.io/topics/data-types

http://redis.io/topics/distlock

http://redis.io/topics/partitioning

https://redis.io/topics/persistence#aof-advantages

https://redis.io/topics/persistence#aof-advantages

http://redis.io/topics/pubsub

http://redis.io/commands/rpoplpush#pattern-reliable-queue

http://redis.io/commands/rpoplpush#pattern-reliable-queue

http://redis.io/topics/replication

http://www.salesforce.com/eu/platform/pricing/

http://www.salesforce.com/eu/platform/pricing/

http://socket.io/

http://socket.io/docs/rooms-and-namespaces/

https://socket.io/docs/server-api/#namespace-emit-eventname-args

https://socket.io/docs/server-api/#namespace-emit-eventname-args

https://github.com/socketio/socket.io-emitter

https://github.com/socketio/socket.io-p2p

https://github.com/socketio/socket.io-protocol

https://github.com/socketio/socket.io-redis

https://www.npmjs.com/package/socket.io.users

http://www.ibm.com/developerworks/library/wa-reverseajax2/fig01.gif

http://www.ibm.com/developerworks/library/wa-reverseajax2/fig01.gif

https://medium.com/@denizozger/finding-the-right-node-js-websocket-implementation-b63bfca0539#.isyo3prtn



https://github.com/brianfrankcooper/YCSB

http://dx.doi.org/10.1109/MC.2012.33

http://dx.doi.org/10.1109/MC.2012.33

Bibliography

[91 Guruduth Banavar, Tushar Chandra, Bodhi Mukherjee, Jay Nagarajarao,

Robert E Strom, and Daniel C Sturman. An e ient multi ast proto ol for

ontent-based publish-subs ribe systems. In Distributed Computing Systems,

1999. Pro eedings. 19th IEEE International Conferen e on, pages 262272.

IEEE, 1999. http://dx.doi.org/10.1109/ICDCS.1999.776528.

[92 Mike Belshe, Martin Thomson, and Roberto Peon. Hypertext Transfer Proto-

ol Version 2 (HTTP/2). RFC Editor, 2015. http://dx.doi.org/10.17487/

RFC7540.

[93 Tim Berners-Lee, Roy Fielding, and Henrik Frystyk. Hypertext Transfer Proto-

ol HTTP/1.0. RFC Editor, 1996. http://dx.doi.org/10.17487/RFC1945.

[94 Ken Birman and Thomas Joseph. Exploiting virtual syn hrony in distributed

systems. SOSP '87 Pro eedings of the eleventh ACM Symposium on Operat-

ing systems prin iples, pages 123138, 1987. http://dx.doi.org/10.1145/

37499.37515.

[95 William J Bolosky, John R Dou eur, David Ely, and Marvin Theimer. Feasibil-

ity of a serverless distributed le system deployed on an existing set of desktop

PCs. 28(1):3443, 2000. http://dx.doi.org/10.1145/345063.339345.

[96 Eri Brewer. Cap twelve years later: How the" rules" have hanged. Computer,

45(2):2329, 2012. http://dx.doi.org/10.1109/MC.2012.37.

[97 Brad Cain, Abbie Barbir, Raj Nair, and Oliver Spats he k. Known ontent

network ( n) request-routing me hanisms. RFC Editor, 2003. http://dx.doi.

org/10.17487/RFC3568.

[98 Ri k Cattell. S alable SQL and NoSQL data stores. ACM SIGMOD Re or,

39:1227, 2010. http://dx.doi.org/10.1145/1978915.1978919.

[99 Min Chen, Shiwen Mao, and Yunhao Liu. Big data: A survey. Mobile Net-

works and Appli ations, 19(2):171209, 2014. http://dx.doi.org/10.1007/

s11036-013-0489-0.

[100 Shaiful Alam Chowdhury, Varun Sapra, and Abram Hindle. Client-Side Energy

E ien y of HTTP/2 for Web and Mobile App Developers. Software Anal-

ysis, Evolution, and Reengineering (SANER), 2016 IEEE 23rd International

Conferen e on, 5, 2016. http://dx.doi.org/10.1109/SANER.2016.77.

[101 Flaviu Cristian. Syn hronous and asyn hronous. Communi ations of the ACM,

39(4):8897, 1996. http://dx.doi.org/10.1145/227210.227231.

80

http://dx.doi.org/10.1109/ICDCS.1999.776528

http://dx.doi.org/10.17487/RFC7540



http://dx.doi.org/10.1145/37499.37515

http://dx.doi.org/10.1145/37499.37515

http://dx.doi.org/10.1145/345063.339345

http://dx.doi.org/10.1109/MC.2012.37



http://dx.doi.org/10.1145/1978915.1978919

http://dx.doi.org/10.1007/s11036-013-0489-0

http://dx.doi.org/10.1007/s11036-013-0489-0

http://dx.doi.org/10.1109/SANER.2016.77

http://dx.doi.org/10.1145/227210.227231

Bibliography

[102 Douglas Cro kford. The appli ation/json Media Type for JavaS ript Ob-

je t Notation (JSON). RFC Editor, 2006. http://dx.doi.org/10.17487/

RFC4627.

[103 Mi hael Cusumano. Cloud omputing and SaaS as new omputing platforms.

Communi ations of the ACM, 53(4):2729, 2010. http://dx.doi.org/10.

1145/1721654.1721667.

[104 Frank Dabek, Ni kolai Zeldovi h, Frans Kaashoek, David Mazières, and Robert

Morris. Event-driven programming for robust software. In Pro eedings of the

10th workshop on ACM SIGOPS European workshop, pages 186189. ACM,

2002. http://dx.doi.org/10.1145/1133373.1133410.

[105 Jerey Dean and Sanjay Ghemawat. Mapredu e: simplied data pro essing

on large lusters. Communi ations of the ACM, 51(1):107113, 2008. http://

dx.doi.org/10.1145/1327452.1327492.

[106 Ivan Fette and Alexey Melnikov. The WebSo ket Proto ol. RFC Editor, 2011.

http://dx.doi.org/10.17487/RFC6455.

[107 Armando Fox and Eri A Brewer. Harvest, yield, and s alable tolerant systems.

Hot Topi s in Operating Systems, 1999. Pro eedings of the Seventh Workshop

on, pages 174178, 1999. http://dx.doi.org/10.1109/HOTOS.1999.798396.

[108 John Gantz and David Reinsel. The digital universe in 2020: Big data, bigger

digital shadows, and biggest growth in the far east. IDC iView: IDC Analyze

the future, 2007(2012):116, 2012.

[109 Jesse James Garrett et al. Ajax: A new approa h to web appli ations. 2005.

[110 Charles David Graziano. A performan e analysis of xen and kvm hypervisors

for hosting the xen worlds proje t. 2011.

[111 Katarina Grolinger, Wilson A Higashino, Abhinav Tiwari, and Miriam AM

Capretz. Data management in loud environments: Nosql and newsql data

stores. Journal of Cloud Computing: Advan es, Systems and Appli ations,

2(1):22, 2013. http://dx.doi.org/10.1186/2192-113X-2-22.

[112 Carl A Gutwin, Mi hael Lippold, and TC Graham. Real-time groupware in

the browser: testing the performan e of web-based networking. In Pro eedings

of the ACM 2011 onferen e on Computer supported ooperative work, pages

167176. ACM, 2011. http://dx.doi.org/10.1145/1958824.1958850.

[113 Jing Han, E Haihong, Guan Le, and Jian Du. Survey on NoSQL database.

Pervasive Computing and Appli ations (ICPCA), 2011 6th International Con-

81



http://dx.doi.org/10.1145/1721654.1721667

http://dx.doi.org/10.1145/1721654.1721667

http://dx.doi.org/10.1145/1133373.1133410

http://dx.doi.org/10.1145/1327452.1327492

http://dx.doi.org/10.1145/1327452.1327492


http://dx.doi.org/10.1109/HOTOS.1999.798396

http://dx.doi.org/10.1186/2192-113X-2-22

http://dx.doi.org/10.1145/1958824.1958850

Bibliography

feren e on, pages 363366, 2011. http://dx.doi.org/10.1109/ICPCA.2011.

6106531.

[114 Robin He ht and Stefan Jablonski. Nosql evaluation: A use ase oriented

survey. In Cloud and Servi e Computing (CSC), 2011 International Confer-

en e on, pages 336341. IEEE, 2011. http://dx.doi.org/10.1109/CSC.2011.

6138544.

[115 Markus Hofmann and Leland R Beaumont. Content networking: ar hite ture,

proto ols, and pra ti e. Elsevier, 2005.

[116 Mi hael Jones, John Bradley, and Nat Sakimura. Json web token (jwt). RFC

Editor, 2015. http://dx.doi.org/10.17487/RFC7519.

[117 George Lawton. Developing software online with platform-as-a-servi e te hnol-

ogy. Computer, 41(6):1315, 2008. http://dx.doi.org/10.1109/MC.2008.

185.

[118 Paul J Lea h, Mi hael Mealling, and Ri h Salz. A universally unique identier

(uuid) urn namespa e. 2005. http://dx.doi.org/10.17487/rf 4122.

[119 Georg JP Link, Dominik Siemon, Gert-Jan de Vreede, and Susanne Robra-

Bissantz. Evaluating an hored dis ussion to foster reativity in online ol-

laboration. In CYTED-RITOS International Workshop on Groupware, pages

2844. Springer, 2015. http://dx.doi.org/10.1007/978-3-319-22747-4_3.

[120 Salvatore Loreto, P Saint-Andre, S Salsano, and G Wilkins. Known Issues

and Best Pra ti es for the Use of Long Polling and Streaming in Bidire tional

HTTP. RFC Editor, 2011. http://dx.doi.org/10.17487/RFC6202.

[121 James Martin and Savant Institute. Managing the data-base environment.

Prenti e-Hall Englewood Clis (NJ), 1983.

[122 Maged Mi hael, Jose E Moreira, Doron Shiloa h, and Robert W Wisniewski.

S ale-up x s ale-out: A ase study using nut h/lu ene. In Parallel and Dis-

tributed Pro essing Symposium, 2007. IPDPS 2007. IEEE International, pages

18. IEEE, 2007. http://dx.doi.org/10.1109/IPDPS.2007.370631.

[123 Neil Middleton, Ri hard S hneeman, et al. Heroku: Up and Running. "

O'Reilly Media, In .", 2013.

[124 Jerey C Mogul, Jim Gettys, Henrik Frystyk, Tim Berners-Lee, and Roy T

Fielding. Hypertext Transfer Proto ol HTTP/1.1. RFC Editor, 1997.

http://dx.doi.org/10.17487/RFC2068.

82

http://dx.doi.org/10.1109/ICPCA.2011.6106531

http://dx.doi.org/10.1109/ICPCA.2011.6106531

http://dx.doi.org/10.1109/CSC.2011.6138544

http://dx.doi.org/10.1109/CSC.2011.6138544


http://dx.doi.org/10.1109/MC.2008.185

http://dx.doi.org/10.1109/MC.2008.185

http://dx.doi.org/10.17487/rfc4122

http://dx.doi.org/10.1007/978-3-319-22747-4_3


http://dx.doi.org/10.1109/IPDPS.2007.370631


Bibliography

[125 Gavin Mulligan, Denis Gra, et al. A omparison of SOAP and REST im-

plementations of a servi e based intera tion independen e middleware frame-

work. Pro eedings of the 2009 Winter Simulation Conferen e (WSC), pages

14231432, 2009. http://dx.doi.org/10.1109/WSC.2009.5429290.

[126 Nikos Ntarmos, Ioannis Patlakas, and Peter Triantallou. Rank join queries

in nosql databases. Pro eedings of the VLDB Endowment, 7(7):493504, 2014.

http://dx.doi.org/10.14778/2732286.2732287.

[127 Nurzhan Nurseitov, Mi hael Paulson, Randall Reynolds, and Clemente Izuri-

eta. Comparison of JSON and XML Data Inter hange Formats: A Case Study.

S enario, pages 157162, 2012. https://pdfs.semanti s holar.org/8432/

1e662b24363e032d680901627aa1bfd6088f.pdf.

[128 Vi toria Pimentel and Bradford G Ni kerson. Communi ating and Displaying

Real-Time Data with WebSo ket. IEEE Internet Computing, 16:4553, 2012.

http://dx.doi.org/10.1109/MIC.2012.64.

[129 Paul Pres od. Roots of the REST/SOAP Debate. In Extreme Markup Lan-

guages®. Citeseer, 2002.

[130 Ansar Raque. Evaluating nosql te hnologies for histori al nan ial data, 2013.

[131 Rüdiger S hollmeier. A denition of peer-to-peer networking for the lassi a-

tion of peer-to-peer ar hite tures and appli ations. Peer-to-Peer Computing,

2001. Pro eedings. First International Conferen e on, pages 101102, 2001.

http://dx.doi.org/10.1109/P2P.2001.990434.

[132 Rami Sellami, Sami Bhiri, and Bruno Defude. ODBAPI: a unied REST

API for relational and NoSQL data stores. 2014 IEEE International Congress

on Big Data, pages 653660, 2014. http://dx.doi.org/10.1109/BigData.

Congress.2014.98.

[133 Mar Shapiro. Optimisti repli ation and resolution. In En y lopedia of

Database Systems, pages 19951995. Springer, 2009. http://dx.doi.org/10.

1007/978-0-387-39940-9_258.

[134 Sang Shin. Introdu tion to json (javas ript obje t notation). Presentation

www. javapassion. om, 2010.

[135 Konstantin Shva hko, Hairong Kuang, Sanjay Radia, and Robert Chansler.

The hadoop distributed le system. Mass Storage Systems and Te hnologies

(MSST), 2010 IEEE 26th Symposium on, pages 110, 2010. http://dx.doi.

org/10.1109/MSST.2010.5496972.

83

http://dx.doi.org/10.1109/WSC.2009.5429290

http://dx.doi.org/10.14778/2732286.2732287

https://pdfs.semanticscholar.org/8432/1e662b24363e032d680901627aa1bfd6088f.pdf

https://pdfs.semanticscholar.org/8432/1e662b24363e032d680901627aa1bfd6088f.pdf

http://dx.doi.org/10.1109/MIC.2012.64

http://dx.doi.org/10.1109/P2P.2001.990434

http://dx.doi.org/10.1109/BigData.Congress.2014.98

http://dx.doi.org/10.1109/BigData.Congress.2014.98

http://dx.doi.org/10.1007/978-0-387-39940-9_258

http://dx.doi.org/10.1007/978-0-387-39940-9_258

http://dx.doi.org/10.1109/MSST.2010.5496972

http://dx.doi.org/10.1109/MSST.2010.5496972

Bibliography

[136 Feng Tian, Berthold Reinwald, Hamid Pirahesh, Tobias Mayr, and Jussi Mylly-

maki. Implementing a s alable XML publish/subs ribe system using relational

database systems. SIGMOD '04 Pro eedings of the 2004 ACM SIGMOD in-

ternational onferen e on Management of data, pages 479490, 2004. http://

dx.doi.org/10.1145/1007568.1007623.

[137 Stefan Tilkov and Steve Vinoski. Node.js: Using JavaS ript to build high-

performan e network programs. IEEE Internet Computing, 14(6):80, 2010.

http://dx.doi.org/10.1109/MIC.2010.145.

[138 Devesh Tiwari and Yan Solihin. Ar hite tural hara terization and similarity

analysis of sunspider and Google's V8 Javas ript ben hmarks. Performan e

Analysis of Systems and Software (ISPASS), 2012 IEEE International Sympo-

sium on, pages 221232, 2012. http://dx.doi.org/10.1109/ISPASS.2012.

6189228.

[139 Can Türker and Mi hael Gertz. Semanti integrity support in sql: 1999 and

ommer ial (obje t-) relational database management systems. The VLDB

Journal The International Journal on Very Large Data Bases, 10(4):241269,

2001. http://dx.doi.org/10.1007/s007780100050.

[140 Matteo Varvello, Kyle S homp, David Naylor, Jeremy Bla kburn, Alessandro

Finamore, and Konstantina Papagiannaki. Is the Web HTTP/2 Yet? Le -

ture Notes in Computer S ien e, 9631:218232, 2016. http://dx.doi.org/

10.1007/978-3-319-30505-9_17.

[141 Werner Vogels. Eventually onsistent. Communi ations of the ACM, 52(1):40

44, 2009. http://dx.doi.org/10.1145/1435417.1435432.

[142 Erik Wilde. Putting things to REST. S hool of Information, 2007.

84

http://dx.doi.org/10.1145/1007568.1007623

http://dx.doi.org/10.1145/1007568.1007623

http://dx.doi.org/10.1109/MIC.2010.145

http://dx.doi.org/10.1109/ISPASS.2012.6189228

http://dx.doi.org/10.1109/ISPASS.2012.6189228

http://dx.doi.org/10.1007/s007780100050

http://dx.doi.org/10.1007/978-3-319-30505-9_17

http://dx.doi.org/10.1007/978-3-319-30505-9_17

http://dx.doi.org/10.1145/1435417.1435432

Appendix A

Snippets

A.1 PostgreSQL realtime retrieve trigger

We all a ommand to send updated data (notify hanges) to the publish/sub-

s ribe system as des ribed in 4.2.1. To do that we reate a postgreSQL fun tion [54

shown in the listing A.1 that alls our external program, then we reate a postgreSQL

after trigger [61 that alls this fun tion (so we re eive insert/delete/update events)

as shown in A.2.

This way is a standard SQL with triggers, some spe i postgreSQL fun tions were

used, of ourse the languages hange between database systems and needs. But the

idea is still valid.

1 CREATE or REPLACE FUNCTION notify(text , text , text , text ) RETURNS text

AS '

2 #!/ bin/bash

3 node notify $1 $2 $3 $4 </dev/null >/dev/null 2>&1 &

4 ' LANGUAGE plsh ;

Listing A.1: Notify fun tion

1 CREATE OR REPLACE FUNCTION notify_trigger_pro () RETURNS trigger AS

$notify_trigger_pro $

2 DECLARE

3 old_v TEXT ;

4 new_v TEXT ;

5 BEGIN

6 IF TG_OP = 'INSERT ' THEN

7 old_v := '';

8 ELSE

9 old_v := (SELECT ('[' || row_to_json (OLD) || ''):: json ->> 0);

10 END IF;

11 IF TG_OP = 'DELETE ' THEN

12 new_v := '';

13 ELSE

14 new_v := (SELECT ('[' || row_to_json (NEW) || ''):: json ->> 0);

85

A. Snippets

15 END IF;

16 PERFORM notify(NEW.id , TG_OP , old_v ,new_v);

17 RETURN NEW;

18 END;

19 $notify_trigger_pro $ LANGUAGE plpgsql;

20

21 CREATE TRIGGER notify_trigger

22 AFTER UPDATE INSERT OR UPDATE OR DELETE ON DATA_TABLE

23 FOR EACH ROW

24 EXECUTE PROCEDURE notify_trigger_pro ();

Listing A.2: Notify trigger

A.2 So ket.io ustom logi

1 so ket.on(' hat message', fun tion(msg)

2 io.emit(' hat message', msg.split('').reverse().join(''

));

3 );

Listing A.3: So ket.io ustomization

86

Appendix B

Jepsen

We exe uted jepsen framework to test a postgreSQL server deployed in a virtual

ma hine with 4GB RAM and 1CPU (of ourse on the same physi al ma hine of the

other tests).

The test used was the standard jepsen postgreSQL test [45,46, it emulates simulta-

neous bank transfers and raw read of a ounts. It onsiders a fail when it is not

possible to exe ute a transfer, sin e there is a negative amount (i.e. other transfers

in the meantime).

The results obtained are shown as harts in gures B.1,B.2,B.3. In the harts also

the failures of reads are shown, even if they are 0, transfer info is the time required

to he k the amount before doing the transfer.

Figure B.1: Jepsen laten y raw

We an observe that tests do a lot of transfers and reads for some se onds. In

harts we have the measures for every operation (or some of them) for every se ond

of the test (or hunks of se onds), the measures that we have are: time (laten y)

needed to pro ess the operation, shown in B.1; time (laten y) needed to pro ess the

operation with quantiles, shown in B.2; throughput of every operation, shown in

B.3.

87

B. Jepsen

Figure B.2: Jepsen laten y quantiles

Figure B.3: Jepsen rate

88

Date post:	27-Jun-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Evaluating Client-Side Replicated NoSQL Databases Approaches€¦ · Redis. 32 4.1.3 et.io Sok c....

Documents