TEMPLATE DESIGN © 2008
www.PosterPresentations.com
0.1 0.5 5.0 50.0
0.0
0.2
0.4
0.6
0.8
1.0
Request latency in ms
Distributed NoSQL Storage for Extreme-Scale System ServicesTonglin Li1, Ioan Raicu1, 2
1Illinois Institute of Technology, 2Argonne National Laboratory
Abstract FRIEDA-State: Scalable State Management for Scientific Applications on Cloud
Graph/Z: A Key-Value Store Based Scalable Graph Processing System
Selected Publications
WaggleDB: A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks
Acknowledgement
Journal papers•Tonglin Li, Xiaobing Zhou, Ioan Raicu, etc., A Convergence of Distributed Key-Value Storage in Cloud Computing and Supercomputing, Journal of CCPE 2015.•Iman Sadooghi, Tonglin Li, Kevin Brandstatter, Ioan Raicu, etc. Understanding the Performance and Potential of Cloud Computing for Scientific Applications, IEEE Transactions on Cloud Computing (TCC), 2015•Ke Wang, Kan Qiao, Tonglin Li, Michael Lang, Ioan Raicu, etc. Load-balanced and locality-aware scheduling for data-intensive workloads at extreme scales, Journal of CCPE 2015.
Conference papers•Tonglin Li, Ke Wang, Dongfang Zhao, Kan Qiao, Iman Sadooghi, Xiaobing Zhou, Ioan Raicu, A Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud, IEEE International Conference on Big Data, 2015 •Tonglin Li, Kate Keahey, Ke Wang, Dongfang Zhao, Ioan Raicu, A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks, ScienceCloud 2015•Tonglin Li, Ioan Raicu, Lavanya Ramakrishnan, Scalable State Management for Scientific Applications in the Cloud, IEEE International Congress on Big Data 2014•Tonglin Li, Xiaobing Zhou, Ioan Raicu, etc. ZHT: A Light-weight Reliable Persistent Dynamic Scalable Zero-hop Distributed Hash Table, IPDPS, 2013.•Dongfang Zhao, Zhao Zhang, Xiaobing Zhou, Tonglin Li, Ke Wang, Dries Kimpe, Philip Carns, Robert Ross, and Ioan Raicu. FusionFS: Towards Supporting Data-Intensive Scientific Applications on Extreme-Scale High-Performance Computing Systems, IEEE International Conference on Big Data 2014 •Ke Wang, Xiaobing Zhou, Tonglin Li, Dongfang Zhao, Michael Lang, Ioan Raicu. Optimizing Load Balancing and Data-Locality with Data-aware Scheduling, IEEE International Conference on Big Data 2014
Posters and extended abstracts•Tonglin Li, Chaoqi Ma, Jiabao Li, Xiaobing Zhou, Ioan Raicu, etc. , GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System, IEEE Cluster 2015 •Tonglin Li, Kate Keahey, Rajesh Sankaran, Pete Beckman, Ioan Raicu, A Cloud-based Interactive Data Infrastructure for Sensor Networks, SC2014 •Tonglin Li, Raman Verma, Xi Duan, Hui Jin, Ioan Raicu. Exploring Distributed Hash Tables in High-End Computing, ACM SIGMETRICS Performance Evaluation Review (PER), 2011
Motivationq Processing graph queryq Handle big data setq Fault tolerance
Design and Implementationq Pregel-like processing modelq Using ZHT as backendq Partitioning at master node
Highlighted featuresq Data localityq Load balance
Performance
Contribution
q ZHT: A light-weight reliable persistent dynamic scalable zero-hop distributed hash table§ Design and implementation of ZHT and optimized for high-end
computing§ Verified scalability on 32K-cores scale§ Achieving latencies of1.1ms and throughput of 18M ops/sec on a
supercomputer and 0.8ms and 1.2M ops/sec on a cloud§ Simulated ZHT on 1 million-node scale for the potential use in extreme
scale systems. q ZHT/Q: A Flexible QoS Fortified Distributed Key-Value Storage System
for the Cloud § Supports different QoS latency on a single deployment for multiple
concurrent applications,§ Both guaranteed and best-effort services are provided§ Benchmarks on real system (16 nodes) and simulations (512 nodes)
q FRIEDA-State: Scalable state management for scientific applications on cloud§ Design and implementation of FRIEDA-State§ lightweight capturing, storage and vector clock-based event ordering § Evaluation on multiple platforms at scales of up to 64 VMs
q WaggleDB: A Dynamically Scalable Cloud Data Infrastructure for Sensor Networks§ Design and implementation of WaggleDB§ Supporting high write concurrency, transactional command execution
and tier-independent dynamic scalability§ Evaluated with up to 128 concurrent clients
q GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System§ Design and implementation of GRAPH/Z, a BSP model graph processing
system on top of ZHT.§ The system utilizes data-locality and minimize data movement between
nodes. § Benchmarks up to 16-nodes scales.
ZHT: A Light-weight Reliable Dynamic Scalable Zero-hop Distributed Hash Table
Motivationq Performance gap between storage and computing resourceq Large storage systems suffering bottle neck of metadataq No suitable key-value store solution on HPC platforms
Design and Implementationq Written in C++, few dependencyq Modified Consistent hashingq Persistent backend: NoVoHT
Primitivesq insert, lookup, removeq append, cswap, callback
Highlighted featuresq Persistenceq Dynamic membershipq Fault tolerance via replication
Performance
Motivationq Cloud for scientific applicationsq Need application reproducibility and persistence of stateq Clock drifting issue in dynamic environments
Design and Implementationq Use local files to store captured statesq Merge and reorder with vector clock q Key-value store for storage and query support
Motivationq Unreliable network q Wide range of request rateq Admins interaction to nodesq High write concurrencyq Many data types for sensorsq Scalable architecture
Design and Implementationq Multi-tier architectureq Independent components in each tierq Organize each tier as a Phantom domain for dynamic scalingq Message queues as write buffersq Transactional interaction via databaseq Column-family with semi-structured data for various data types
Performance
On both HPC systems and clouds the continuously wideningperformance gap between storage and computing resourceprevents us from building scalable data-intensive systems.Distributed NoSQL storage systems are known for their ease ofuse and attractive performance and are increasingly used asbuilding blocks of large scale applications on cloud or datacenters. However there are not many works on bridging theperformance gap on supercomputers with NoSQL data stores.
This work presents a convergence of distributed NoSQLstorage systems in clouds and supercomputers. It firstly presentsZHT, a dynamic scalable zero-hop distributed key-value store,that aims to be a building block of large scale systems on cloudsand supercomputers. This work also presents several realsystems that have adopted ZHT as well as other NoSQLsystems, namely ZHT/Q (a Flexible QoS Fortified DistributedKey-Value Storage System for the Cloud), FREIDA-State (statemanagement for scientific applications on cloud), WaggleDB (aCloud-based interactive data infrastructure for sensor networkapplications), and Graph/Z (a key-value store based scalablegraph processing system); all of these systems have beensignificantly simplified due to NoSQL storage systems, and havebeen shown scalable performance.
ZHT instance
ZHT Manager
Update
Response to request
Partition
ZHT instance
Partition
Responseto request
Physical node
Membership table
UUID(ZHT)IPPortCapacityworkload
Broadcast
Applicationsq Distributed storage systems: ZHT/Q, FusionFS, IStoreq Job scheduling/launching system: MATRIX, Slurm++ q Other systems: Graph/Z, Fabriq
ZHT/Q: A Flexible QoS Fortified Distributed Key-Value Storage System for the Cloud
Motivationq Needs of running multiple applications on single data storeq Optimizing single deployment for many different requirements
Design and Implementationq Request batching proxyq Dynamic batching strategy
Highlighted featuresq Adaptive request batchingq QoS supportq Traffic-aware automatic performance tuning
Performance
Request Handler
B1 B2 B3 Bn-1 Bn
Condition Monitor& Sender
Batching Strategy Engine
PluginPluginPlugin
Check condition
Check results
Batch buckets
Push requests to batchUpdate condition
Returned batch results
Sending batches
Initiate results
KV
KV
KV
KV…
KV
KV
Result Service
Unpack and insert
Client API Wrapper
Choose strategy
…
Latency Feedback
Response buffer
0"
2000"
4000"
6000"
8000"
10000"
12000"
14000"
16000"
1" 2" 4" 8" 16"
Single'nod
e'throughp
ut'in'ops/s'
Number'of'nodes'
Pa*ern"1"
Pa*ern"2"
Pa*ern"3"
Pa*ern"4"
Workloads with multiple QoS
Performance
02000400060008000
10000120001400016000
1 2 4 8 16 32 64
Late
ncy
in n
s
Client number
File: amortized latency 1 Cassandra server2 Cassandra servers 4 Cassandra servers8 Cassandra servers DynamoDB
0
500
1000
1500
2000
2500
3000
3500
4000
1 2 4 8 16 32 64 128
Late
ncy
in u
s
Client number
Amortized merging
Amortized moving
File write atency
Storage solution comparison Overhead analysis of file based storage solution
13.5%
8.5%
6.1%4.8%
462.8% 539.1% 568.3%
1% 2% 4% 8%
0%
2%
4%
6%
8%
10%
12%
14%
16%
1%
10%
100%
1000%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 110% 120% 130%
Number'of'queue'servers'
Latency'in'm
s'
Latency'in'm
s'(log)'
Time'in'sec'
Average%Latency%
Real:;me%Latency%
0"
0.5"
1"
1.5"
2"
2.5"
3"
1" 2" 4" 8" 16" 32" 64" 128"
Speedup&
Clients&#&
2"4"8"
Queue&Server&Number&
0"1"2"3"4"5"6"7"8"9"10"11"12"13"14"15"
1" 2" 4" 8" 16" 32" 64" 128"
Latency(in(ms(
Clients(#(
1"2"4"8"
Queue"Server"Number"
Scalable current write Speedup of distributed servers Dynamic tier scalingBatch request latency distributions Throughputs and scalability
System architecture Distributed event ordering