+ All Categories
Home > Technology > Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend for JanusGraph

Scylla Summit 2017: Performance Evaluation of Scylla as a Database Backend for JanusGraph

Date post: 22-Jan-2018
Category:
Upload: scylladb
View: 796 times
Download: 1 times
Share this document with a friend
14
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES First and last name Position, company Performance Study for JanusGraph Storage Backends, Scylla, Cassandra, and HBase Chin Huang, Software Engineer, IBM Ted Chang, Performance Engineer, IBM
Transcript

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Performance Study for JanusGraph Storage Backends, Scylla, Cassandra, and HBase

Chin Huang, Software Engineer, IBM

Ted Chang, Performance Engineer, IBM

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Chin Huang

Chin Huang is a software engineer at the IBM Open Technologies. He has worked on software development, solutions integration, and performance evaluation for open source projects such as OpenStack, JanusGraph, and various databases.

2

Ted Chang is a software engineer at IBM Open Technologies and Design for Performance. He has worked on various enterprise and open source cloud solutions. At the moment, his focus is JanusGraphperformance and characterization.

Ted Chang

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Agenda

▪ Overview – Graph database storage backends

▪ Performance evaluation scenarios and results

o Insert vertices (inserts)

o Insert edges (= search + update)

o Graph traversal (query)

▪ Lessons learned

▪ Q&A

3

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Overview – Graph database storage

backends

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Overview▪ JanusGraph is a highly scalable graph database optimized for storing and

querying large graphs.

▪ JanusGraph stores graphs in adjacency list format which means that a graph is

stored as a collection of vertices with their adjacency list.

▪ Data storage layer is pluggable. Most common storage backends are

Cassandra and HBase. We want to add Scylla to the mix!

▪ Test workloads:

o Insert vertices - writes

o Insert edges - reads and writes

o Queries – reads

▪ Test environments: database clusters

5

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Performance test environment▪ Server speco Physical servers: x3650 M5, 2 sockets x 14 cores, 384 GB (12 x 32G) memory

o CPU: Intel Xeon Processor E5-2690 v4 14C 2.6GHz 35MB Cache 2400MHz

o Network interface: Emulex VFA5.2 ML2 Dual Port 10GbE SFP+ Adapter

o Disk: 720 GB SSD, RAID 5

o Operating system: Ubuntu 16.04.2 LTS

▪ Public toolso jMeter - load testing tool

o nmon, nmon analyser - system performance monitor and analyze tool

o VisualVM - all-in-one Java troubleshooting/profiling tool

o GCeasy - garbage collection log analysis tool

o Prometheus and grafana – monitoring dashboard

▪ Home grown toolso Graph schema loader, data generator, batch importer

6

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Performance Test Topology

7

Cassandra

Hbase + HDFS

+ Zookeeper

Scylla

Cassandra

Hbase + HDFS

+ Zookeeper

Scylla

Cassandra

Hbase + HDFS

+ Zookeeper

Scylla

JanusGraph

Database Cluster

Load injector

queryinsert, update

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Performance evaluation scenarios

and results

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Performance Evaluation: Insert Vertices

9

▪ 40 mil vertices in total

▪ 2 properties for each vertex

▪ Insert scenario

▪ Fully utilize the injectors to

generate the loading against

the databases

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Performance Evaluation: Insert Edges

10

▪ 30 mil edges in total

▪ 1 property for each edge

▪ Query and update scenario

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Performance Evaluation: Graph Traversal

11

▪ Query: g.V().has('name', ‘usr_name').in('Follows').as('a').out('Retweets').in('Tweets').has('name',

‘usr_name').select('a').values('name').dedup()

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Lessons LearnedScylla

• Easy clustering - adding multiple nodes at once

• Well self-tuned but also lacks documentation

• Even load distributed

• Fully utilize system resources

• CPU utilization mis-represents real loads

• Nice monitoring dashboard – prometheus + grafana

• Works with existing Cassandra utility clients

12

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

Lessons Learned - Continued

Cassandra

• Cluster bootstrapping takes more efforts

• Smaller memory footprint

HBase

• Uneven CPU% on caused by hot regions

• Need to carefully configure read and write cache settings for

better throughput

13

PRESENTATION TITLE ON ONE LINE AND ON TWO LINES

First and last namePosition, company

THANK YOU

[email protected]

[email protected]

Please stay in touch

Any questions?


Recommended