Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend for JanusGraph

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Performance Study for JanusGraph Storage Backends, Scylla, Cassandra, and HBase

Chin Huang, Software Engineer, IBM

Ted Chang, Performance Engineer, IBM



Chin Huang

Chin Huang is a software engineer at the IBM Open Technologies. He has worked on software development, solutions integration, and performance evaluation for open source projects such as OpenStack, JanusGraph, and various databases.

2

Ted Chang is a software engineer at IBM Open Technologies and Design for Performance. He has worked on various enterprise and open source cloud solutions. At the moment, his focus is JanusGraphperformance and characterization.

Ted Chang



Agenda

▪ Overview – Graph database storage backends

▪ Performance evaluation scenarios and results

o Insert vertices (inserts)

o Insert edges (= search + update)

o Graph traversal (query)

▪ Lessons learned

▪ Q&A

3



Overview – Graph database storage

backends



Overview▪ JanusGraph is a highly scalable graph database optimized for storing and

querying large graphs.

▪ JanusGraph stores graphs in adjacency list format which means that a graph is

stored as a collection of vertices with their adjacency list.

▪ Data storage layer is pluggable. Most common storage backends are

Cassandra and HBase. We want to add Scylla to the mix!

▪ Test workloads:

o Insert vertices - writes

o Insert edges - reads and writes

o Queries – reads

▪ Test environments: database clusters

5



Performance test environment▪ Server speco Physical servers: x3650 M5, 2 sockets x 14 cores, 384 GB (12 x 32G) memory

o CPU: Intel Xeon Processor E5-2690 v4 14C 2.6GHz 35MB Cache 2400MHz

o Network interface: Emulex VFA5.2 ML2 Dual Port 10GbE SFP+ Adapter

o Disk: 720 GB SSD, RAID 5

o Operating system: Ubuntu 16.04.2 LTS

▪ Public toolso jMeter - load testing tool

o nmon, nmon analyser - system performance monitor and analyze tool

o VisualVM - all-in-one Java troubleshooting/profiling tool

o GCeasy - garbage collection log analysis tool

o Prometheus and grafana – monitoring dashboard

▪ Home grown toolso Graph schema loader, data generator, batch importer

6



Performance Test Topology

7

Cassandra

Hbase + HDFS

+ Zookeeper

Scylla

Cassandra

Hbase + HDFS

+ Zookeeper

Scylla

Cassandra

Hbase + HDFS

+ Zookeeper

Scylla

JanusGraph

Database Cluster

Load injector

queryinsert, update



Performance evaluation scenarios

and results



Performance Evaluation: Insert Vertices

9

▪ 40 mil vertices in total

▪ 2 properties for each vertex

▪ Insert scenario

▪ Fully utilize the injectors to

generate the loading against

the databases



Performance Evaluation: Insert Edges

10

▪ 30 mil edges in total

▪ 1 property for each edge

▪ Query and update scenario



Performance Evaluation: Graph Traversal

11

▪ Query: g.V().has('name', ‘usr_name').in('Follows').as('a').out('Retweets').in('Tweets').has('name',

‘usr_name').select('a').values('name').dedup()



Lessons LearnedScylla

• Easy clustering - adding multiple nodes at once

• Well self-tuned but also lacks documentation

• Even load distributed

• Fully utilize system resources

• CPU utilization mis-represents real loads

• Nice monitoring dashboard – prometheus + grafana

• Works with existing Cassandra utility clients

12



Lessons Learned - Continued

Cassandra

• Cluster bootstrapping takes more efforts

• Smaller memory footprint

HBase

• Uneven CPU% on caused by hot regions

• Need to carefully configure read and write cache settings for

better throughput

13



THANK YOU

[email protected]

[email protected]

Please stay in touch

Any questions?

mailto:[email protected]

mailto:[email protected]

Date post:	22-Jan-2018
Category:	Technology
Upload:	scylladb
View:	796 times
Download:	1 times

Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend for JanusGraph

Technology