Date post: | 22-Jan-2018 |
Category: |
Technology |
Upload: | scylladb |
View: | 796 times |
Download: | 1 times |
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Performance Study for JanusGraph Storage Backends, Scylla, Cassandra, and HBase
Chin Huang, Software Engineer, IBM
Ted Chang, Performance Engineer, IBM
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Chin Huang
Chin Huang is a software engineer at the IBM Open Technologies. He has worked on software development, solutions integration, and performance evaluation for open source projects such as OpenStack, JanusGraph, and various databases.
2
Ted Chang is a software engineer at IBM Open Technologies and Design for Performance. He has worked on various enterprise and open source cloud solutions. At the moment, his focus is JanusGraphperformance and characterization.
Ted Chang
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Agenda
▪ Overview – Graph database storage backends
▪ Performance evaluation scenarios and results
o Insert vertices (inserts)
o Insert edges (= search + update)
o Graph traversal (query)
▪ Lessons learned
▪ Q&A
3
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Overview – Graph database storage
backends
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Overview▪ JanusGraph is a highly scalable graph database optimized for storing and
querying large graphs.
▪ JanusGraph stores graphs in adjacency list format which means that a graph is
stored as a collection of vertices with their adjacency list.
▪ Data storage layer is pluggable. Most common storage backends are
Cassandra and HBase. We want to add Scylla to the mix!
▪ Test workloads:
o Insert vertices - writes
o Insert edges - reads and writes
o Queries – reads
▪ Test environments: database clusters
5
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Performance test environment▪ Server speco Physical servers: x3650 M5, 2 sockets x 14 cores, 384 GB (12 x 32G) memory
o CPU: Intel Xeon Processor E5-2690 v4 14C 2.6GHz 35MB Cache 2400MHz
o Network interface: Emulex VFA5.2 ML2 Dual Port 10GbE SFP+ Adapter
o Disk: 720 GB SSD, RAID 5
o Operating system: Ubuntu 16.04.2 LTS
▪ Public toolso jMeter - load testing tool
o nmon, nmon analyser - system performance monitor and analyze tool
o VisualVM - all-in-one Java troubleshooting/profiling tool
o GCeasy - garbage collection log analysis tool
o Prometheus and grafana – monitoring dashboard
▪ Home grown toolso Graph schema loader, data generator, batch importer
6
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Performance Test Topology
7
Cassandra
Hbase + HDFS
+ Zookeeper
Scylla
Cassandra
Hbase + HDFS
+ Zookeeper
Scylla
Cassandra
Hbase + HDFS
+ Zookeeper
Scylla
JanusGraph
Database Cluster
Load injector
queryinsert, update
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Performance evaluation scenarios
and results
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Performance Evaluation: Insert Vertices
9
▪ 40 mil vertices in total
▪ 2 properties for each vertex
▪ Insert scenario
▪ Fully utilize the injectors to
generate the loading against
the databases
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Performance Evaluation: Insert Edges
10
▪ 30 mil edges in total
▪ 1 property for each edge
▪ Query and update scenario
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Performance Evaluation: Graph Traversal
11
▪ Query: g.V().has('name', ‘usr_name').in('Follows').as('a').out('Retweets').in('Tweets').has('name',
‘usr_name').select('a').values('name').dedup()
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Lessons LearnedScylla
• Easy clustering - adding multiple nodes at once
• Well self-tuned but also lacks documentation
• Even load distributed
• Fully utilize system resources
• CPU utilization mis-represents real loads
• Nice monitoring dashboard – prometheus + grafana
• Works with existing Cassandra utility clients
12
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Lessons Learned - Continued
Cassandra
• Cluster bootstrapping takes more efforts
• Smaller memory footprint
HBase
• Uneven CPU% on caused by hot regions
• Need to carefully configure read and write cache settings for
better throughput
13
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
THANK YOU
Please stay in touch
Any questions?