Project: Distributed Databases Access Control Security vs...

Project: Distributed Databases

Access Control Security vs. Performance

Yosi Barad, Ilia Oshmiansky, Ainat Chervin

Abstract

Over the last few years, distributed storage systems have found widespread adoption, Cassandra

database being one of them. It is an example of a highly scalable and high-performance distributed

storage system for managing large amounts of structured data. However, in terms of security,

Cassandra supports access control lists (ACLs) only for the column family1 level. In our, work we

extended Cassandra so that it supports ACLs at the cell level, which is defined as the column level in

the Cassandra database. Furthermore we benchmarked the performance of Cassandra with our

implementation of ACLs in comparison to the original Cassandra database and to the Accumulo table,

which can also support cell level ACLs.

In contrast to other traditional storage systems, the Cassandra database sacrifices it's consistency in

favor of availability and partition tolerance, so that it only guarantees eventual consistency among its

nodes and clusters2.As we know, according to the CAP theorem, every distributed system can satisfy

any two of the three (Consistency, Availability and Partition tolerance)guarantees at the same time, but

it is impossible to provide all three. Therefore we wanted to measure the inconsistency window created

by the eventual consistency which might undermine and flaw the ACLs mechanism in our Cassandra

database implementation. This paper includes the description of these tests mentioned above and our

findings.

Introduction

Many large organizations today, such as Google, Facebook and Amazon, manage and store massive

amounts of data and handle large scale applications that require high performance, availability and

scalability. And as more and more applications like these are launched in environments that have

immense workloads, such as web services, their scalability requirements change very quickly and grow

very large. Relational databases, which were the customary use of storage and data management, scale

well, but usually only when that scaling happens on a single server node. Once the capacity of that

single node is reached, you need to scale out and distribute that load across multiple server nodes and

this task gets very complicated with these kinds of databases. It was this need that distributed database

systems came to resolve. Distributed database systems use an architecture that distributes storage and

processing across multiple servers, and are therefore the perfect solution to these growing data

management demands.

However, scalability comes at a price. Having multiple servers handling the data may cause lower

consistency and expose security holes. It is this issue of security and consistency versus performance

that our project wishes to research.

This project focuses on two popular table stores, Cassandra and Accumulo, for analyzing this matter.

While the access control of Cassandra is at the level of column family, Accumulo has a higher level of

security and allows defining cell-level access control. The main goals of this project are to add support

for cell-level ACLs (Access Control Lists) to Cassandra and compare the resulting system to

Accumulo, evaluate the performance and measure the security holes. The project will attempt to

improve the security by increasing the consistency, while measuring the performance penalty.

Our project's testing tool and scenarios were based and influenced by the paper, "YCSB++ :

Benchmarking and Performance Debugging Advanced Features in Scalable Table Stores". We aspired

to extend the tests that were done there and therefore we used similar testing configurations.

1A column family, similar to a table, is a container for rows which are containers for columns.

2A Cluster contains one or more nodes of Cassandra servers working together.e.g. share data, load balance etc..

http://www.pdl.cmu.edu/PDL-FTP/Storage/socc2011.pdf

http://www.pdl.cmu.edu/PDL-FTP/Storage/socc2011.pdf

This document is divided into two parts:

1) Methodological part – this part gives the details as to how we implemented our solution for

ACL support and how we analyzed that solution and the consistency issues as well. It is

separated into three sections:

Code Implementation - this section describes our implementation of ACL support in

Cassandra. It gives information on the Cassandra terminology and structure, code

analysis and finally the different versions that were created.

ACL Benchmarking - this section describes the practical details of how we tested

our implementation and an analysis of our results.

Consistency Tests -this section describes the consistency tests that were performed

and their analysis.

2) Plan vs Actual – this part describes the plan we first devised and the how well we followed it.

Methodological part

This part is divided into three main sections:

1) Description of our implementation

2) Description of the ACL benchmarking tests and results

3) Description of the consistency tests and results

Code implementation

Background

In Cassandra terminology, a "Cell" is actually called a "Column" it is the lowest/smallest increment of

data and is represented as a tuple (triplet) that contains a name, a value and a timestamp. Next we have

a "Column Family" which is a container for rows, and can be thought of as a table in a relational

system. Each row in a column family can be referenced by its key.

Finally, a "Keyspace" is a container for column families. It's roughly the same as a schema or database

that is, a logical collection of tables. Figure 3 illustrates this structure.

Cassandra Code Analysis

We found that the Cassandra code has a very modular structure and that the security logic is separate

from the rest of the code. Basically, there are two security related interfaces:

Iauthenticator, which is responsible for authenticating the user that logs in (responsible for the

login process). This interface contains the “authenticate” function.

Iauthority, which is responsible for authorizing the access of an authenticated user to a specific

resource. This interface contains the “authorize” function.

Thanks to this structure it is possible to write your own code which implements these functions.

In the initial code there were two classes that implemented the security interfaces:

"AllowAllAuthentication.java" and "AllowAllAuthorization.java". The first class implements the

Iauthenticator interface, and the second class implements the Iauthority interface.

Since these classes don't supply any restrictions (as their names imply) their implementation allows

anyone to connect and execute any operation.

In addition to these, we found a more advanced implementation that contains two more classes:

"SimplelAuthentication.java" and "SimpleAuthorization.java". These classes provide a very basic

user/password authentication and access control for the Column Family level.

The "SimplelAuthentication.java" and "SimpleAuthorization.java" implementations use two additional

configuration files:

1) password.properties – This file contains the list of users and their passwords. The passwords

can be either in clear text or MD5SUMs of the password.

2) access.properties – This file contains entries in the format:

KEYSPACE[.COLUMNFAMILY].PERMISSION=USERS where

KEYSPACE is the keyspace name.

COLUMNFAMILY is the column family name.

PERMISSION is one of <ro> or <rw> for read-only or read-write respectively.

USERS is a comma delimited list of users from password.properties.

For example the entry "ks1.cf1.<rw>=yosi", means that 'yosi' is allowed to read and write to

the ColumnFamily 'cf1' in Keyspace 'ks1'.

Our implementations

We devised our implementations based on the research we did on Cassandra's structure and the analysis

of the existing code. Our implementations involve only server side extensions so we could support as

many clients as we want. Since no changes were made on the client side, it may be used with its regular

client shell while working with our Cassandra implementation.

Within the scope of our project we implemented three different solutions for the cell level ACL. These

implementations evolved one after another respectively:

Simple authentication based solution where the ACL is saved to a configuration file (Will not

be described due to its inefficiency)

Saving the ACL as part of the values within the Cassandra database.

Creating a new column for the ACL to be stored in the Cassandra database.

All of the solutions support the client's ability to decide the list of users that would be exposed to their

resources – columns or super columns and determine each user's set of permissions. This is done using

the following syntax that we support in our implementations –

When creating a new column you can specify the users you want to grant permissions for this new

column. The format is: <column value>[:<user1,user2,...><permission>] [: ...].

For example: Set Cities[utf8('1234')][utf8('City')] = utf8('Haifa:scottrw:Andrewro');

This command creates a new column named "City" with value "Haifa" in column family "Cities"

having a row key "1234", giving Scott read\write permissions and Andrew read only permissions.

Given a string of value and permissions using this syntax, we can easily parse it and determine whether

a user is authorized to execute a certain command or not.

Though these solutions have a common goal, they all achieve it in a different manner.

The following is a more detailed description of the two relevant versions we came up with:

1) Version 1.1 - Saving the ACL as part of the values within the Cassandra database:

This version included the following modifications:

a) We removed the "SimpleAuthority" related functions, namely – HasColumnFamilyAccess and

HasKeyspaceAccess so that we do not depend on reading/writing values from/to the

access.Properties file.

b) We changed the implementation of the "authorize" function in the SimpleAuthority class to

read the ACL from the value within the Cassandra DB rather than from the access.Properties

file.

c) Modified the insert operation – once a user tries to insert new column we would like to check

if this column already exists. If the column exists we check the ACL given to it and permit

this operation only if the user has write permission in that ACL. If the column doesn’t exist

we permit the creation of a new one and allow the user to set the ACL of this new column.

d) We added new functionality for parsing the value returned from Cassandra so we could return

only the original value without the ACL to the user after verification confirms he has read

permissions to it. We added a new "get" function to Cassandra which we used internally to

grab and separate the ACL from the actual value of a Column. For example, if the Column

Family "Students" had the key "yosi" and within it the Column 'id' with the value

'038151098:eran rw:nir ro' and the user ran the command "get Students['yosi']['id'] we would

return only the value '038151098' if the user is nir or eran. But if someone wanted to change

the value of this field with "set Students['yosi']['id'] = '432432'" we would first have to get the

value and parse the ACL and only then allow him to change it should he be in the ACL.

e) We changed the way a column removal takes place. Once a user tries to remove a column

from Cassandra, we first retrieve the column value with the ACL from the database and

perform access control verification and only if the user has write permission on that cell the

operation will be allowed.

f) We edited the batch mutate and get slice operations in the Cassandra server. These functions

are used to perform an intense insert and retrieval respectively to and from the database. They

may accumulate many columns and super columns from a verity of row keys and insert them

at once to the database. Regarding this implementation these functions were modified so they

will verify access control for every column or super column one would like to insert or

retrieve from the database. This verification will allow the execution of only user permitted

operations -write and read accordingly.

The following scenario illustrates the above implementation:

Consider a two node Cassandra cluster where a column is being inserted into node 1. Afterwards

another user tries to retrieve this column from node 2.Node 2 will query node 1 regarding the column;

this will trigger a value parse. The ACL on that value will be checked and since Dani is not included in

the ACL, the operation is denied.

2) Version 1.2 - Creating a new column for the ACL to be stored in Cassandra database:

This version included the following modifications:

a) We modified the insert operation – for each column insertion we saved another column which

maintained the ACL of the given value. These two columns are identified by their column

names. For example if one tries to insert the column: Set student['yosi']['email']=

'[email protected]:yosi rw:odelia ro'; We insert the following two columns into the database:

An exception to this behavior happens if the column we would like to insert already exists in

the database. In that case we first check for the user permission to write on that cell by

invoking the ACL column (which must exist also as it corresponds to the column who

initiated its creation) if the user has write permission, the insert operation is allowed and the

column + ACL column is updated according to the user request. Otherwise the operation is

denied and a message is prompt to the user.

b) We modified the get operation – once a user tries to retrieve a column from the database we

first invoke the ACL column that corresponds to it. If the user has the permission to write or

read on that ACL the operation is approved and the original column containing only the value

is returned to the user. Otherwise the operation is denied and a message is prompt to the user.

c) We modified the remove operation – similar to the above clause when a remove operation is

being processed the ACL column is invoked and only if the user has the permission to write

on that ACL the operation is approved and both column + ACL columns are removed from

the database. Otherwise the operation is denied and a message is prompt to the user.

d) We modified the batch mutate and get slice operations in the Cassandra server. These

functions were modified like before, only in this version the write and read permissions are

verified by invoking the ACL column for each one of the columns among the mutation.

The following section will describe the tests that were run on version 1.1.

ACL Benchmarking

As mentioned in the Code Implementation section, we implemented code that supports cell level access

control lists in Cassandra. We tested the implementation version 1.1 where the ACL's are saved as part

of the values to the database. The ACL benchmarking tests were done in order to compare the

performance of Cassandra with the new implementation to Accumulo, and also to Cassandra without

the implementation.

First we will describe the installation and configurations needed to run the tests. Next we will describe

the test scenarios along with how to perform the tests and finally we will analyze the results.

The benchmark tests of the ACLs, consist of the following:

A comparison of a three node Cassandra cluster with and without ACLs in different YCSB

client and thread configurations

A comparison of Cassandra versus Accumulo using only a one node cluster with and without

ACLs

Installation and Configuration

Installations

To run the ACL benchmarking tests we installed the following components:

Cassandra database – This includes both with and without the added implementation. The

original Cassandra was installed according to the documentation found on the Apache

Cassandra website. To be able to add ACL support to the code, we downloaded the source

code using the documentation found here:

http://wiki.apache.org/cassandra/RunningCassandraInEclipse(Instructionsfor how to install

Cassandra with the ACL implementation now already added can be found on our website).

Accumulo database – this includes installing the Hadoop file system and Zookeeper, a server

that provides distributed coordination (installation according to instructions found on their

websites)

YCSB++ – this is the benchmarking tool used to test both of the databases. Installation and

further instructions were done according to this site:

https://github.com/MiloPolte/YCSB/wiki/Using-YCSB--

Cassandra configurations

In order to compare Cassandra's performance with ACL implementation to that of Accumulo's, we

configured a cluster containing one node -

Configuring a one node cluster

To configure a Cassandra node without our implementation, we simply followed the instructions found

on the Cassandra site. In order to use the Cassandra containing the ACL support, we made the

following configurations:

Set the environment variables so that CASSANDRA_HOME is the path to the folder containing the

Cassandra code and JAVA_HOME is the path to the java folder.

Edited the passwd.properties file by adding a username and password in the following way:

<username>=<password>with which we will login to Cassandra in the later tests.

Finally, we edited the log4j-server.properties file by changing the log4j.appender.R.File line to point to

the system log file to be created in the Cassandra folder.

For comparing Cassandra with and without the ACL implementation, we configured a cluster

containing three nodes -

Configuring a three node cluster

To get three nodes to work together in a cluster we needed to modify the Cassandra

"cassandra.yaml"file on all of the three workstations comprising of the cluster. We needed to change

them in terms of the following steps:

First we defined the "seed" in the cluster. In the Cassandra database, the node defined as the seed,

provides the new nodes information about the ring such as what other nodes are included in it, what are

their locations, their token range and so on. After a node joins the ring, it shares ring information

through the gossip protocol, and does not make any further contact with the seed node.

Second we set the 'Listen' and 'RPC' addresses for each one of the nodes to be their own IP addresses.

Since the nodes communicate via the Gossip protocol, we specify the interfaces on which they will

listen for client traffic via Thrift and inter-cluster traffic.

Third we set the Token value for all of the nodes. Each Cassandra server [node] is assigned a unique

Token that determines what keys it is the first replica for. In order to maintain a load balance in the

cluster it is recommended to set the nodes with the following token values:

Node1- initial token: 0

Node2- initial token: 56713727820156410577229101238628035242 Node3- initial token: 113427455640312821154458202477256070485

http://wiki.apache.org/cassandra/RunningCassandraInEclipse

https://github.com/MiloPolte/YCSB/wiki/Using-YCSB--

The above configuration allows 33% of the keys to be stored on each node and thus enables a load

balance between the nodes in the cluster.

Once everything was configured and the nodes were running, we used the Cassandra node tool ring

utility for verifying that the cluster is connected properly:

Finally, for both configurations, we created the keyspace "usertable" and column family "data" which

are mandatory for performing the desired benchmark tests carried out by the YCSB tool.

For further details on installing and configuring Cassandra turn to the "Cassandra Acl – Getting

Started" guide found on our website: http://course.cs.tau.ac.il/secws12/?q=node/21

Accumulo configurations

Accumulo was configured only as a one node cluster in order to compare its performance to that of

Cassandra's with ACL support. The configuration of the database was done according to the

instructions found on the Accumulo website.

YCSB++ configurations

In order to use YCSB++ for connecting to any server, you must have a binding for it, meaning a java

class that adapts YCSB's client interface to that of your server's API. It is called the DB interface layer,

which as mentioned, is a java class that executes read, insert, update, delete and scan calls generated by

the YCSB client into calls against a database's API. When using the YCSB client you need to specify

the class name of the layer on the command line, and the client will dynamically load the appropriate

interface class. YCSB++ already contains such a binding for Cassandra that comes built in with the

installation link, for Accumulo on the other hand, the binding can be found in the following separate

link: https://github.com/MiloPolte/YCSB/tree/master/db/accumulo and you need to rebuild the project

using Ant commands in order to use YCSB++ with this binding.

Scenario Details

As mentioned above, we ran two main benchmarks:

A comparison of a three node Cassandra cluster with and without ACLs

A comparison of Cassandra versus Accumulo using only a one node cluster with and without

ACLs

Set-up:

We ran tests on two different set-ups in accordance with the two main benchmarks defined. One set-up

was of Cassandra and Accumulo with one node in the cluster, and the other was a set-up of Cassandra

with three nodes in the cluster (Cassandra meaning both the database with the ACL support and the

original database for comparison).

YCSB++:

The YCSB++ client, was also configured to run in two ways -

1) Run 1 client with 100 threads

2) Run 6 clients 16 threads each

Workloads:

We used the core workloads supplied in the YCSB installation along with two additional ones we

configured ourselves:

Workload A: Update heavy workload - This workload has a mix of 50/50 reads and writes.

Workload B: Read mostly workload - This workload has a 95/5 reads/write mix.

Workload C: Read only - This workload is 100% read.

Workload D: Read latest workload - In this workload, new records are inserted, and the most

recently inserted records are the most popular.

Workload F: Read-modify-write - In this workload, the client will read a record, modify it, and write

back the changes

http://course.cs.tau.ac.il/secws12/?q=node/21

https://svn.apache.org/repos/asf/accumulo/tags/1.4.1/README

https://github.com/MiloPolte/YCSB/tree/master/db/accumulo

Insert Workload – This workload writes 100 thousand single-cell rows in an empty table.

Read Workload – This workload reads 100 thousand rows

Test steps:

Using the one node cluster set-up, we ran each workload on each database (Cassandra and Accumulo)

with the following number of ACL entries:

1,4,5,6,7,8,9,10,11 three times each. To get performance results with 0 ACL entries, we ran the original

Cassandra and Accumulo without security, with the same configurations.

Next, we repeated the test of increasing ACL's in the three node set-up, only for Cassandra.

The final results were calculated as an average of all three runs.

Description of running the tests with YCSB++ with increasing ACLs:

To use YCSB++ for running ACL benchmarks on Cassandra, you need to run the client with four main

specifications. The first is the database binding you wish to use which is a java class that operates as an

interface between YCSB and the database you are running. The second is the Workload class used to

execute the workloads in this case "CoreWorkload". The third is the specific workload you want to run

which is defined by a parameters file that the client reads and contains properties such as the number of

operations to execute, the number of threads and so on. We altered the Cassandra client so that it will

also read an additional property "cassandra.acl" from this file which is the ACL string containing the

different entries. The third specification you need is the username and password with which to connect

to the database.

Regarding Accumulo, in order to run it with security attributes using YCSB++, you need to use both

the Accumulo client class called "AccumuloCientSecurity" and change the workload class to

"CoreWorkloadSecurity". This allows you to provide the additional properties related to the ACLs such

as: accumulo.username, accumulo.password, security.credential, security.cell.enable,

security.cell.entries etc. This means that unlike with Cassandra, here you don't need to specify the

username and password in the run command as it is already contained in the workload parameters file.

Having all the installations and configurations ready, we ran the benchmarks using a simple script and

the results we got and their analysis will be described in the next section.

Results

All the graphs shown below consist of columns ranging from 0 to 11. The 0 column represents the

results of the test running on the "old" Cassandra – meaning the initial code we began working on that

does not support cell level ACL (official Cassandra GIT repository files). The other columns represent

the results running on our implementation where every number means the number of ACL entries

added to each value. For example column 3 means we added something like ":ainat,yosi,ilia rw" to the

value. For each column we show the result of the test in operations/sec.

Cassandra 1 client 3 nodes:

In this scenario we use one YCSB client PC and a configuration of three Cassandra nodes with a load

balance configuration (each node responsible for a third of the values).

The results we got are the following:

0

5000

10000

15000

20000

25000

30000

35000

0 1 4 5 6 7 8 9 10 11

Throughput (ops/sec)

Number of entries in the ACL

Workload A

Description: This figure represent the workload A benchmark test which is a mix of 50/50 reads and writes operations out of total 100,000 operations.

0

5000

10000

15000

20000

25000

30000

0 1 4 5 6 7 8 9 10 11



Workload C

Description: This figure represent the workload C benchmark test which is a Read only workload . Total 100,000 operations in this test.

These results show no significant difference in performance between the original Cassandra (the 0-acl

column) and our implementation (1-11 acl columns in these graphs).

Looking more in-depth into the reason for the lack of changes we found that while performance looked

the same throughput-wise, the CPU usage indicators with the original Cassandra were generally

significantly lower (roughly about twice as low). Attempting to quantify these changes did not prove to

be fruitful since the computers we used for these tests weren’t ours exclusively and the results we got

varied significantly from run to run (and so did the actual throughput results – but less so).

To reach a point where the difference in performance will also be evident in the results we had to

increase the load on the server, and so we moved to the next testing phase.

Cassandra 6 client 3 nodes:

Here we used the same Cassandra configuration (load-balance configuration with three nodes), but this

time we used six YCSB++ clients with sixteen threads each, running the workload simultaneously.

We ran the same exact tests as with the one client, summed up the results from the clients and got the

following:

0

5000

10000

15000

20000

25000

0 1 4 5 6 7 8 9 10 11



Workload F

Description: This figure represent the workload F benchmark test which is a Read-modify-Write workload. The client reads a record, modify it and write it back Total 100,000 operations.

0 5000

10000 15000 20000 25000 30000 35000 40000 45000 50000

0 1 4 5 6 7 8 9 10 11



Workload A


These results show a significant decrease in performance when comparing the no-acl version to our

ACL-enabled version. Looking at the CPU indicators we noticed a significant increase in CPU usage

with the no-ACL version (original Cassandra) and in both cases CPU usage peaked and seemed to be

the limiting factor in the performance of these systems.

Now that we have actually found the peak-performance stats of these systems we are able to quantify

the actual loss in performance with the introduction of the fine-grained ACLs.

Workload A – which is the 50/50 read/write benchmark shows the lowest decrease. When comparing

the results with no ACLS (45.6k ops/sec) to the results with only one ACL entry (27.4k ops/sec) we get

0

10000

20000

30000

40000

50000

60000

0 1 4 5 6 7 8 9 10 11



Workload C

Description: This figure represent the workload C benchmark test which is a Read only workload . Total 100,000 operations in this test.

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

0 1 4 5 6 7 8 9 10 11



Workload F


a total of 40% decrease in performance. This is the smallest decrease in performance we measured

and it sits well with our expectations.

In our implementation of the column level ACL on the insert/update operation (in Cassandra these two

are the same operation) we first fetch the old column, parse its ACL part and then decide whether the

user is allowed to change the column. When the column is new the fetching part will return null and we

will just insert the column and skip the CPU heavy string-parsing phase. This is why the "insert"

operation throughput is hardly affected by our implementation.

The Read operation on the other hand is affected and this is where most of this 40% loss comes from.

Workload C – which is the Read only benchmark, shows a 61% decrease in performance.

Workload F – which is the Update benchmark, showed about the same decrease as the Read

benchmark – 61.5%

Comparing the results of our implementation with different number of ACL entries we get the

following:

Test 1-ACL 11-ACLs Total decrease

50/50 read/write 27.4k 20.5k 25%

Read only 21.3k 15.1k 29%

Update 15.2k 11.1k 27%

Accumulo 1 client 1 node:

Accumulo has a native built-in cell-level ACL so both the 0 ACL test and the 1-11 ACL tests were

done on the same system (as opposed to Cassandra).

The tests were run on a single node of Accumulo with a single YCSB++ client running 100 threads.

The ultimate goal of these tests was to establish a base line against which we could measure the

performance of our implementation of ACL in Cassandra. This is what we got:

0

1000

2000

3000

4000

5000

6000

7000

0 1 4 5 6 7 8 9 10 11



Workload A - Accumulo


These tests gave us extremely low throughputs compared to Cassandra (even with a one node

configuration) so there was no point in comparing the throughput head-to-head but we could still use

the data to determine the relative percentage loss in performance with the introduction of cell-level

ACLs.

We gathered all the data from the Cassandra and Accumulo tests and created the following comparison

tables:

0

200

400

600

800

1000

1200

1400

1600

1800

0 1 4 5 6 7 8 9 10 11



Workload C - Accumulo

Description: This figure represent the workload C benchmark test which is a Read only workload . Total 100,000 operations.

0

100

200

300

400

500

600

700

800

900

0 1 4 5 6 7 8 9 10 11



Workload F - Accumulo


Total

decrease

no ACL

Accumulo

1-ACL

Accumulo

Total

decrease

no ACL

Cassandra

1-ACL

Cassandra

Test

13% 6.4 5.6 40% 45.6 27.4 50/50 read/write

50% 1.6 0.8 61% 54 21.3 Read only

9% 0.8 0.73 61% 39 15.2 Update

In the table above we can see that the largest gap in the decrease in performance is in the update

operation, showing a 61% decrease in Cassandra compared to only 9% in Accumulo. Next is the 50/50

read/write with 40% compared to 13%. And lastly read with 61% compared to 50% in Accumulo. The

read operation seem to have given us the worst results in Accumulo which in return gave us the best

relative performance in Cassandra with our implementation of ACL.

Total

decrease

1-ACL

Accumulo

11-ACL

Accumulo

Total

decrease

1-ACL

Cassandra

11-ACL

Cassandra

Test

79% 5.6 1.2 25% 27.4 20.5 50/50 read/write

36% 0.8 0.51 29% 21.3 15.1 Read only

30% 0.73 0.51 26% 15.2 11.2 Update

The above table compares the decrease in performance when increasing the number of ACL entries

from 1 to 11.

In this comparison it seems that Cassandra with our implementation is generally doing better than

Accumulo at supporting multiple ACL entries. With this being said, it is still clear that Accumulo ACL

performance is far superior to our system since the overall decrease in performance from no ACL to 11

ACL entries in Cassandra is still much greater in all but the 50/50 read/write, which as we said before,

was the benchmark least influenced by our implementation.

To conclude, here is a graphical representation of this analysis:

The chart above shows the percentage of decrease in performance of the two systems in two scenarios:

The system with no ACLs compared to the same system with only one ACL entry.

The system with one ACL entry compared to the same system with eleven ACL entries. *Notice that the chart shows the decrease in performance so the smaller the values the better it is.

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

50/50 read/write Read only Update

Pe

rce

nt

De

cre

ase

Cassandra with cell-level ACL vs Accumulo

Accumulo 1 to 11 ACLs

Cassandra 1 to 11 ACLs

Accumulo 0 to 1 ACLs

Cassandra 0 to 1 ACLs

Consistency Tests

Weak Consistency in Cassandra

In the first part of this section we will try to briefly explain what consistency is in regard to databases,

the different types of consistency and how it relates to our project. Then we will move to showing

different examples of the way consistency may be used in real-life situations and how it relates to

security. Afterwards we will discuss the different aspects, settings and properties of the consistency

(and replications in general) in Cassandra and lastly we will present our work - The testing

environment, specs, conclusions and results of our tests and try to evaluate the effect of our

implementation on consistency.

Part 1: Introduction

Consistency describes how and whether a system is left in a consistent state after an operation. In

distributed data systems like Cassandra, this usually means that once a writer has written, all readers

will see that write. The "how and whether" of this system are described by a "consistency model". To

understand the term "consistency model" consider the following example:

- The row X is replicated on nodes M and N

- The client A writes row X to node N

- After a period of time t, client B reads row X from node M

The consistency model has to determine whether client B sees the write from client A or not. In

practice there are various factors that determine whether he will see the change, from the way the

database is set, to the parameters of the insertion and the parameters of the "read". But in general we

can distinguish between two types of consistency models that would determine the answer to this

question:

1) The "strong" consistency model: in this model consistency is a must. Every replica of the data

will be exactly the same at all times and no two clients will ever get different data from

different nodes.

2) The "weak" (or eventual) consistency model: in this model we will allow the data not to be

100% consistent on all replicas and in this case client B (from the example above) might not

see the read.

"CAP" (Consistency, Availability, Partition) theory talks about the trade off in distributed computer

systems (Cassandra is such a system). It states that there are three fundamental guarantees in these

systems: Consistency, Availability and Partition Tolerance. According to this theory, only two can be

satisfied at one time. In our case, partition tolerance is a given in our system and we can play with the

two remaining parameters by choosing an appropriate consistency model. With the strong consistency

we choose consistency over availability, while the value is being updated on all the nodes the resource

will not be available. And with weak consistency it’s the other way around.

In our project we had to implement an extra layer of security and we chose to implement it by saving

the Access Control List (ACL) in the same way we save values, so the same rules that apply to the

values will also apply to the ACL. This means that if you choose to use our implementation of ACL

and wish to have a system with good "availability", then the ACLs will also be subject to the rules of

eventual consistency and there might be inconsistencies between the ACLs of the replicas of the same

data on different nodes.

To understand better what this all means we will show several examples.

Part 2: Examples of eventual consistency and how it relates to security

Example one:

Assume you want to change your Facebook profile picture, as soon as you apply the change you would

like to see the affect take place but you don’t mind if your friend in the US sees the update 10 seconds

later. In this scenario we would want "strong" consistency for the local change (if, for example, we can

contact three different nodes we would like all three to have the latest copy of the profile picture) and

"weak" consistency for the rest of the network.

This example also demonstrates the tradeoff between availability and consistency because if, for

example, you would set "strong" consistency for this change, it would take the user much more time to

perform the change and while the change is being propagated through the network no one will be able

to see the new profile picture and the user will either think something went wrong or would have to

wait. So we will pay with availability for this added consistency.

Example two:

A Company uses a distributed database to store their sensitive data. When they fire an employee they

wish to remove his access to the databases. As soon as they restrict the employee from having access to

the database, assuming that the database is configured with Weak consistency, it will take a while until

this change is propagated through the network to the other replicas of the data. During this period of

time the employee can still log in to a different node which didn’t yet get updated and copy the

sensitive information of the organization.

Part 3: The consistency and replication settings of Cassandra

Cassandra is designed as a distributed system, for deployment of large numbers of nodes across

multiple data centers. Cassandra is robust and flexible enough that you can configure the cluster for

optimal geographical distribution, for redundancy, for failover and disaster recovery.

Cassandra stores copies, called replicas, of each row based on the row key. You set the number of

replicas when you create a keyspace using the replica placement strategy. In addition to setting the

number of replicas, this strategy sets the distribution of the replicas across the nodes in the cluster

depending on the cluster's topology.

The following is a technical explanation of replication and consistency settings in Cassandra:

The total number of replicas across the cluster is referred to as the replication factor. A replication

factor of one means that there is only one copy of each row on one node. A replication factor of two

means two copies of each row, where each copy is on a different node. All replicas are equally

important; there is no primary or master replica.

In Cassandra there are two default available replication strategies:

SimpleStrategy: This strategy is the default replica placement strategy. SimpleStrategy places the first

replica on a node determined by the partitioner (which we talked about in an earlier chapter).

Additional replicas are placed on the next nodes clockwise in the ring without considering rack or data

center location.

NetworkTopologyStrategy: if your cluster is deployed across multiple data centers, this strategy

specifies how many replicas you want in each data center. The NetworkTopologyStrategy determines

replica placement independently within each data center as follows:

The first replica is placed according to the partitioner (same as with SimpleStrategy).

Additional replicas are placed by walking the ring clockwise until a node in a different rack is found. If

no such node exists, additional replicas are placed in different nodes in the same rack.

"Rack" and "Datacenter" are properties that are given to nodes by the Snitch – A Component that maps

IPs to racks and data centers. It defines how the nodes are grouped together within the overall network

topology.

There are many different "snitches" such as:

- SimpleSnitch which does not recognize data center or rack information.

- RackInferringSnitch which infers (assumes) the topology of the network by the octet of the

node's IP address.

- PropertyFileSnitch - determines the location of nodes by rack and data center. This snitch

uses a user-defined description of the network details located in the property file cassandra-

topology.properties.

The snitch is configured pre-hand within the Cassandra.yaml file which stores misc properties of the

Cassandra server.

The ReplicationStrategyis configured upon creation of a Keyspace as one of its parameters.

Examples:

1. To define a simple cluster with several nodes with three replicas of each row we will just run the

node, connect with the CLI and run:

`create keyspace usertable with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'

and strategy_option s= }replication_factor:6{;`

2. If we have three datacenters named DC1,DC2 and DC3 (some of them with several racks), we

would like to have at least one replica in each of these datacenters and perhaps more than one in the

major DC with several racks. To accomplish this we would have to use NetworkTopologyStrategy and

define the proper snitch. In our case the IPs are not indicative of the layout so we will use the

PropertyFileSnitch. To do that we will open our Cassandra.yaml and configure the endpoint_snitch

property to PropertyFileSnitch.

Next, we would want to open Cassandra-topology.properties, and configure it according to our network

setup in our case it is going to look something like this:

http://www.datastax.com/docs/1.0/cluster_architecture/partitioning#partitioning

http://www.datastax.com/docs/1.0/cluster_architecture/partitioning#partitioning

Finally, we would run the nodes, connect to them with our CLI and run:

`createkeyspaceusertable with placement_strategy =

'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options

={DC1:1,DC2:2,DC3:1}`

This would mean that each row would be replicated exactly once in DC1, twice in DC2 – one replica

on RAC1 and another in RAC2 and once in DC3.

Consistency settings:

In Cassandra, consistency refers to how up-to-date and synchronized a row of data is on all of its

replicas. Cassandra extends the concept of eventual consistency by offering tunable consistency. For

any given read or write operation, the client application decides how consistent the requested data

should be.

Consistency levels in Cassandra can be set on any read or write query. This allows application

developers to tune consistency on a per-query basis depending on their requirements for response time

versus data accuracy. Cassandra offers a number of consistency levels for both reads and writes.

Write Consistency:

When you do a write in Cassandra, the consistency level specifies on how many replicas the write must

succeed before returning an acknowledgement to the client application.

The following consistency levels are available, with ANY being the lowest consistency (but highest

availability), and ALL being the highest consistency (but lowest availability). QUORUM is a good

middle-ground ensuring strong consistency, yet still tolerating some level of failure.

A quorum is calculated as (rounded down to a whole number): (replication_factor / 2) + 1

For example, with a replication factor of 3, a quorum is 2 (can tolerate 1 replica down). With a

replication factor of 6, a quorum is 4 (can tolerate 2 replicas down).

Cassandra 1.0 supports the following write consistency settings:

Level Behavior

ANY Ensure that the write has been written to at least 1 node, including

HintedHandoffrecipients (we will not get into what this means).

ONE Ensure that the write has been written to at least 1 replica's commit log and

memory table before responding to the client.

TWO Ensure that the write has been written to at least 2 replica's before responding

to the client.

THREE Ensure that the write has been written to at least 3 replica's before responding

to the client.

http://www.datastax.com/docs/0.8/dml/data_consistency#tunable-consistency

http://wiki.apache.org/cassandra/HintedHandoff

QUORUM Ensure that the write has been written to N / 2 + 1 replicas before responding

to the client.

LOCAL_QUORUM Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes,

within the local datacenter (requires NetworkTopologyStrategy)

EACH_QUORUM Ensure that the write has been written to <ReplicationFactor> / 2 + 1 nodes in

each datacenter (requires NetworkTopologyStrategy)

ALL Ensure that the write is written to all N replicas before responding to the client.

Read Consistency:

When you do a read in Cassandra, the consistency level specifies how many replicas must respond

before a result is returned to the client application.

Cassandra checks the specified number of replicas for the most recent data to satisfy the read request

(based on the timestamp).

The following consistency levels are available, with ONE being the lowest consistency (but highest

availability), and ALL being the highest consistency (but lowest availability).

Cassandra 1.0 supports the following read consistency settings:

Level Behavior

ONE

Will return the record returned by the first replica to respond. A consistency check

is always done in a background thread to fix any consistency issues when

ConsistencyLevel.ONE is used. This means subsequent calls will have correct

data even if the initial read gets an older value. (This is called *ReadRepair)

TWO Will query 2 replicas and return the record with the most recent timestamp. Again,

the remaining replicas will be checked in the background.

THREE Will query 3 replicas and return the record with the most recent timestamp.

QUORUM

Will query all replicas and return the record with the most recent timestamp once it

has at least a majority of replicas (N / 2 + 1) reported. Again, the remaining

replicas will be checked in the background.

LOCAL_QUORUM Returns the record with the most recent timestamp once a majority of replicas

within the local datacenter have replied.

EACH_QUORUM Returns the record with the most recent timestamp once a majority of replicas

within each datacenter have replied.

ALL Will query all replicas and return the record with the most recent timestamp once all

replicas have replied. Any unresponsive replicas will fail the operation.

*Read Repair - For reads, there are two types of read requests that a coordinator can send to a replica;

a direct read request and a background read repair request. The number of replicas contacted by a direct

read request is determined by the consistency level specified by the client. Background read repair

requests are sent to any additional replicas that did not receive a direct request. To ensure that

frequently-read data remains consistent, the coordinator compares the data from all the remaining

replicas that own the row in the background, and if they are inconsistent, issues writes to the out-of-

http://wiki.apache.org/cassandra/ReadRepair

http://www.datastax.com/docs/0.8/dml/data_consistency#tunable-consistency

date replicas to update the row to reflect the most recently written values. Read repair can be

configured per column family (using read_repair_chance), and is enabled by default.

Part 4 – Our Testing environment

In our tests we aimed to achieve a dynamically configurable environment in which we could configure

the number of replicas and control which nodes receive a replica. So we chose to use

NetworkTopologyStrategywith the PropertyFileSnitch.

We created a cluster consisting of 6 nodes, and configured the Cassandra-topology.properties as

follows:

132.67.104.156=DC1:RAC1

132.67.104.154=DC2:RAC1

132.67.104.152=DC3:RAC1

132.67.104.153=DC4:RAC1

132.67.104.151=DC5:RAC1

172.17.136.200=DC6:RAC1

Each node was a "Datacenter" on its own. Now, if we wanted to have a setup with 3 active nodes all we

had to do was create the appropriate keyspace:

`createkeyspaceusertable with placement_strategy =


={DC1:1,DC2:1,DC3:1}`

To expand this to 5 nodes would just mean create another keyspace and use it instead of usertable:

`createkeyspace usertable1 with placement_strategy =


={DC1:1,DC2:1,DC3:1,DC4:1,DC5:1}`

Our testing platform was a slightly modified version of YCSB++. YCSB++ by default includes a read-

after-write delay test. This works as follows:

There are two YCSB++ clients with different roles, one (called producer) inserts values to NodeA and

the other (called consumer) reads them from NodeB. The consumer and producer sync the values and

insertion times via a 3rd

party db (zookeeper) the exact algorithm is:

1) The producer keeps inserting keys with 10 values (with our unique ACL pattern suffixes).

2) After finishing the insert (getting an OK from the Cassandra node) he will insert the name of

the key into zookeeper and continue to the next key.

3) Meanwhile the consumer will attempt to read (pop) values from the common zookeeper DB.

As soon as he sees a new key he will attempt to read it from Cassandra.

4) If he managed reading it on the first attempt he will ignore this result, report a time lag of 0

and wait for the next key.

5) If he failed reading the key from Cassandra on the first attempt he will keep trying over and

over until he is successful. The output will be the DELTA between the timestamp of the key

from zookeeper and the current time (which is exactly the time between the end of the

insertion and the time of the first read).

The results we got from this were a list of "TimeLags" – the delay between write and read. To generate

more meaningful results from this list we created a python script which takes the data and outputs

different stats such as averages, median, percentiles and different ranges (The script is available on our

site).

Running this test we produced the following result:

http://www.datastax.com/docs/0.8/configuration/storage_configuration#read-repair-chance

This graph shows the average of multiple tests (since the lab we are using is highly unstable we have to

run each test several times to iron-out any instability). The y axis is latency in milliseconds; the x axis

represents the number of replicas. So with 2 nodes we got an average of 118ms delay on average and

the 95th

percentile (call it worse-case latency) was 542ms.

In these results you can notice a general increase in latency the more replicas we use, which is to be

expected since more replicas means more network traffic. With 6 replicas the main node which we

insert to has to send all the values to 5 different nodes. Since we are running the test on maximum

throughput any extra operation means reduced performance. So in addition to the increased latency we

also notice a general decrease in throughput with this increase in replicas. With 2 nodes we had an

insert throughput of around 11k ops/sec but with 6 it dropped down to about 7k.

Our original plan was to isolate any limiting factors and run the tests with varying network conditions,

but due to the limitations of our lab we had to run Cassandra on our personal laptop and connect it to

the cluster via WIFI to perform these tests.

We also found trial software which allowed us to increase the latency to a specific IP. So the mentioned

delays are to the other Cassandra nodes only and not to the YCSB++ clients.

From these tests we got the following results:

As you can see, with WIFI we got an average of 14 seconds(!) delay even without latency. When

increasing the latency to 100ms it skyrocketed to an average of 35 sec (and a worse case of over 2

minutes)

6 5 4 3 2

avarage insert latency 384 325 250 113 118

95th precentile insert latency 1361 1284.2 1193.5 457.15 542

0

200

400

600

800

1000

1200

1400

1600

Late

ncy

(m

s)

Number of replicas

Read after Write latency

2 nodes 3 nodes

no latency 14065 12324

10 ms 17660 19124

100 ms 35616

0

5000

10000

15000

20000

25000

30000

35000

40000

Late

ncy

(m

s)

avarage insert latency with wifi

This goes to show how fragile these systems actually are and that no matter how great your system is

eventual consistency will still be mostly a factor of your networking specs and that the system is only

as strong as its weakest link.

Again, these test were actually performed on a slightly different platform (the laptop rather than the lab

PCs) so it is difficult to say the exact toll a limited bandwidth, high latency and packet loss will have on

the performance of Cassandra – but still this gives a pretty decent idea.

Note that these results reflect the situation in a real-world network far better than the results in the lab

configuration where the latency between machines is under 1ms and the bandwidth is over 100Mb.

Next we moved to testing our read-after-update latency. Since our implementation of ACLs uses the

actual values of the cells to store the ACLs the read-after-update latency is equivalent to "update-ACL"

latency – and here is where the security risks come into play.

To perform these tests we had to modify YCSB++ which didn’t come with this test so we had to

improvise. What we did was add to YCSB a flag which states which keys it should insert. Then we also

added a flag which allows us to add a suffix to the values inserted to Cassandra. With these changes we

performed the test in 2 steps:

1) Run a regular read-after-insert latency test (can run only the producer this time) with a specific

set of keys (let's say 1-100k)And add the following ACL suffix: ":ilia,yosi rw".Let it finish

inserting the keys and move to step 2.

2) Run the Consumer with the user "ainat" which does not yet have permissions to read the

values.

3) Run the Producer on the same set of keys (this means it will update the keys with a new

value), this time with the ACL suffix ":yosi,ainat rw" giving ainat access to these keys.

Running these tests we got the following results:

This graph is the same as the insert graph with regard to update. In this graph we can also notice a clear

deterioration in performance with each extra replica but when compared to the read-after-insert results

these look better, the latency has dropped significantly (about 10 folds).

Looking closer into what the difference was between the two tests, we found something interesting –

the throughput of the update was always around 1.8k but the throughput of the insert was around 10k.

So we decided to check whether that had something to do with it.

Playing with the thread count in the read-after-insert test we managed to get it down to about 2k

ops/sec, looking at the measurements we noticed that the delays were almost all close to 0 and we got

even better results than with the update.

When running the tests with 4k ops/sec we got worse results and as higher we went the worse results

we got (unfortunately we do not have the result of these tests).

The conclusion is again, same as with the WIFI tests, the factor which is most crucial with eventual

consistency is Network specs. Less ops means less traffic, less traffic means faster propagation of data

among nodes and this results in reduced latency.

We ran the same tests with our WIFI configuration and got:

6 5 4 3 2

avarage update latency 32 15 12 9 8.8

95th precentile update latency

111.75 55.25 45.75 20.75 22.2

0

20

40

60

80

100

120

Lat

en

cy (

ms)

Number of replicas

Read after update latency

These results are also better than the insert. The throughput on these tests was around 1.6k (compared

to 6k in the insert) and the results reflect it.

For our final test we wanted to measure the number of values that Cassandra returned on the first read

attempt (let's call them latency=0). We added this to YCSB++ and got:

As you can see, the vast majority of the values are updated "right away" (or too fast for us to measure).

The averages shown in all the previous graphs are only in regard to the values which did not return on

the first attempt, that’s why with update we saw greater instability in the results – the sampling base

was only 5% of the values. These results give us some more insight into the performance and display

data which was not initially considered by the YCSB++ consistency test.

Yet, there is a reason for why it was not considered – the test had its limitations and saying we were

able to read the value on our first attempt only gives a "top border" on the possible delays of these

values. We also ran these tests with WIFI and got these terrible results:

2 nodes 3 nodes

no latency 7268 12093

10 ms 9963 18919

100 ms 27168

0

5000

10000

15000

20000

25000

30000

Late

ncy

(m

s)

avarage update latency with wifi

6 5 4 3 2

precent of values retrieved on first attempt - insert

61 66.5 75.5 50 80.5

precent of values retrieved on first attempt - update

74 90 85 93 95

0 10 20 30 40 50 60 70 80 90 100

Pe

rce

nt

Number of Replicas

Percent of operations sucessfull on the first attempt

These graphs compare the lab-setup where all the machines are on the same LAN segment, to the WIFI

setup where one node is connected to the cluster via WIFI.

With a delay of 100ms close to none of the values was consistent right away, which is to be expected

since the read has no delay but the write has a delay of 100ms.

Part 5- Improving the consistency

Improving the consistency can be achieved in various ways. As part of our conclusions we will discuss

several ways that we have tested to be working which are:

1) Increase the read/write Consistency. Level from ONE to higher:

By increasing the Consistency.Level of operation we can fine-tune the desired consistency on

every read/write operation.

- In our Employee getting fired example we can solve the problem by increasing the

consistency level of the "Remove permissions" operation (in our case it's an insert operation)

to ALL. This way the data will not be available to him as soon as we perform the change.

The trade off here is that while the data is being updated through the network there is going to

be a period of time when no one can access the data so this solution doesn’t scale very well.

- With the Facebook profile picture update example we can set the user's "read" consistency

level of his own profile picture to "LOCAL_QUORUM" which would significantly reduce the

chance for receiving the wrong picture.

We have run the read-after-write and read-after-update tests each with ConsistencyLevel.ALL

once on read and once on write and in both cases the result was the same – 0 time lag.

80

50

14 8

13

5 0

0

10

20

30

40

50

60

70

80

90

2 3

Pe

rce

nt

Number of replicas

Precent of values returned on the first read

insert no-wifi

insert wifi

insert wifi 10ms delay

insert wifi 100ms delay

95 93

44 42 32 30

3

0

20

40

60

80

100

1 2 P

erc

en

t Number of replicas

Precent of values updated on the first read

update no-wfi

update wifi

update wifi 10ms delay

update wifi 100ms delay

2) Reduce the load on the server by better load-balancing

During our testing we noticed that the latency has everything to do with throughput. Running

the test several times with different thread counts (translates to different throughputs)

confirmed this theory. So by monitoring the load on nodes and balancing it can really make a

difference with consistency

3) Use a 3rd

-party synchronization mechanism:

Using a 3rd

party software that would work on a higher level than Cassandra and create a mid-

layer between the user and the DB can also do the trick. Software that would manage the reads

could guarantee that the data is consistent by reading the values from multiple nodes along the

cluster and comparing timestamps. We tested this method by getting our "Consumer" client to

read the data from multiple sources. The results we got were significantly better.

4) Monitor and Improve the quality of your network:

As our test clearly demonstrated –for the data to be consistent there is nothing more important

than a good and stable network. Keeping track and monitoring your network is a must if

consistency is of the essence.

In conclusion, Distributed Systems are highly complex and have many points of failure. When

planning such a system you will need to consider a variety of factors and variables ranging from the

performance of your servers, the stability of the network, the bandwidth and latency between nodes and

more. We hope the results mentioned above contribute in making the appropriate decision when it

comes to consistency.

Plan vs Actual

The following description goes through the plan we first devised and details of how well we followed it

at each stage:

1. System set-up:

This stage included the installation and configuration of Cassandra, Accumulo and YCSB++. Our

milestone one goal was to complete the installations and run initial performance tests. In actual, we

did manage to install Cassandra and Accumulo. We also managed to install YCSB++. Also we ran

the initial tests on Cassandra using YCSB. So all in all this stage was completed successfully

according to the plan.

2. Implementation of cell-level ACLs and performance comparison:

These stages included coming up with ideas for implementation and testing these implementations

with comparison to the original Cassandra and to Accumulo. Our milestone two goal was to finish

the implementation of the cell-level ACL and to complete the evaluation of the performance on a

more advanced configuration of Cassandra and Accumulo. We wanted to create a final version of

the implementation based on the test results by the time we got to milestone three. In actual, when

we got to the presentation of milestone two, we had two implementations ready. The first

implementation was the simple authentication based solution and the second implementation

simply saved the ACL as part of the values within the Cassandra database. We also tested both of

the implementations.

Later on we also added a third solution for the cell level ACL in Cassandra that saves two

columns: column + corresponding ACL column. Unfortunately we only got to perform very basic

tests (throughput benchmark with insert and read) on it from lack of time, these tests showed an

approximate 50% decrease in performance compared to v1.1.

3. Analysis of the security holes and improving the security through stronger consistency: This stage included measuring the security holes that may exist due to the inconsistency of the

ACLs configuration and attempting to improve the security of ACLs by providing a solution with

higher consistency guarantees. Our third and final milestone goal for this stage was to analyze the

security holes, finish implementing the security improvement and measure the performance

penalty.

In addition to these goals, we also had to finish up the benchmarking tests that we didn't complete

in the previous milestone. This included testing Accumulo for the first time. We did manage to test

Accumulo with YCSB++ on a one node configuration but when we tried to extend Accumulo into

three nodes in a cluster, we couldn’t manage to do so mainly due to synchronization problems that

occurred between the Accumulo nodes. Since our time for exploring this issue was limited, we

decided to settle for the one node tests and focus on the consistency analysis that was more

important for this stage.

We used YCSB++ in order to test and measure the inconsistency window that might occur

between the nodes regarding the access control lists saved in the database. Ultimately, we were

able to complete these inconsistency tests as planned. Regarding our wish to improve the security

of both databases through stronger consistency, we came up with several solutions that use

mechanisms in Cassandra.

In Cassandra you can configure the consistency level of the read/write. Since tradeoffs between

consistency and latency are tunable in Cassandra, we can achieve stronger consistency in

Cassandra with an increased latency as a penalty. Therefore in order to minimize the inconsistency

windows between the ACLs in the different nodes we can set a write and/or read consistency level

to ALL in our Cassandra implementation depending on the circumstances as was explained in the

Consistency Tests section.

In conclusion, the majority of this stage was completed successfully.

More information, tutorials and examples may be found on our site: http://infosecdd.yolasite.com

http://infosecdd.yolasite.com/

Date post:	13-Aug-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Project: Distributed Databases Access Control Security vs...

Documents