Benchmarking XDCR Ring Topology at PayPal
1
2
The Couchbase Connect16mobile appTake our in-app survey!
3
Dong WangSenior MTS, Core Data Platform, PayPal
5
Cross Datacenter Replication (XDCR)
©2016 Couchbase Inc.
Power of XDCR
6
Simple,PowerfulAdministration ConsistentHighPerformance
ElasticScalability Multi-DataCenter,Active-ActiveDeployment
7
Continuous Innovation
8
Benchmarking XDCR Ring Topology at PayPal
AGENDA
©2016 PayPal Inc. Confidential and proprietary. 9
1. Speaker/PayPal Intro
2. Background on Multi-DC Bi-Directional Ring Replication Benchmark
3. Benchmark Methodology
4. Benchmark Results
5. Tuning efforts
©2016 PayPal Inc. Confidential and proprietary. 10
About PayPal
• A leading technology platform company that enables digital and mobile payments on behalf of consumers and merchants worldwide.
• NASDAQ: PYPL
• Active customer accounts of 192 million as of Q3 2016
• 1.5 billion transactions processed in Q3 2016
• $87 billion in total payment volume in Q3 2016
©©2016 PayPal Inc. Confidential and proprietary. 11
About Speaker
• Ph.D. in Biochemistry
• Worked on database technologies for the past 18 years
• 10 years at PayPal
• Leading PayPal’s NoSQL Engineering Efforts
©2016 PayPal Inc. Confidential and proprietary. 12
A Multi-Data Center Deployment Challenge
• Requirement:
• Multiple DCs (>=4), each with its own Couchbase cluster
• Data replication among multiple clusters
• Active traffic to all clusters
• Deployment Choices (Ring vs Mesh):
©2016 PayPal Inc. Confidential and proprietary. 13
A Comparison of Ring and Mesh Replication Topologies
Topology
Data Centers
XDCR Streams N=3 N=4 N=5
Ring N 2 x N 6 8 10
Mesh N N x (N-1) 6 12 20
0
5
10
15
20
25
30
35
3 4 5 6
XDCR Streams vs Clusters
Ring Mesh
©2016 PayPal Inc. Confidential and proprietary. 14
Benchmarking a Bi-Directional Ring Deployment in 4 DCs
• Benchmark Environment
• Benchmark Procedure
• Benchmark Data Collection
• Benchmark Data Processing
• Benchmark Results
©2016 PayPal Inc. Confidential and proprietary. 15
Benchmark Environment
• Hardware: Bare Metal, Dell PowerEdge FC630
• 20 cores, 40 processors at 2.6 GHz
• 384 GB RAM
• 2 TB RAID1 SSD
• 10Gb network
• OS: RHEL 6.6
• Couchbase: 4.1.1, XDCR without SSL
• Couchbase Clusters
• 8 nodes per cluster, 4 clusters in 4 DCs
• Benchmark Tool:
• YCSB 0.9.0
• Custom XDCR monitoring python program using Couchbase SDK
©2016 PayPal Inc. Confidential and proprietary.
Benchmark Environment/Components
DCG11
CCG01
DCG12
CCG12
©2016 PayPal Inc. Confidential and proprietary. 17
Benchmark Procedure
Cluster Configuration
YCSB Workload Generation
Data Collection
Data Processing
Replication Latency Monitor
Result Analysis
©2016 PayPal Inc. Confidential and proprietary. 18
Benchmark Procedure –Couchbase Configuration
• Cluster Configuration
• Standard configuration with best OS practices (THP, swappiness, etc)
• XDCR without SSL with easy configuration using Rest APIs
• Bucket Configuration• Bucket Type: Couchbase
• Memory Allocation: 240 GB/cluster (great impact to benchmark outcome, PayPal specific )
• Replica: 1
• Value Ejection
• Disk IO Priority: Default (low)
• Auto Compaction: Default
• Flush Enable: True
©2016 PayPal Inc. Confidential and proprietary. 19
Benchmark Procedure - Workloads
• Workload A: 100% write
• Drop XDCR stream
• Flush bucket
• Create XDCR stream
• Run YCSB workload A, document size at 1KB from 20 client machines
• Run XDCR monitoring traffic in parallel, collect latency to remote DCs
• Workload B: %95 Read + 5% Write (Update)• Preload 200 million documents. Results in 464 GB data on disk, 184 GB data in memory
• Run YCSB workload B, document size at 1 KB from 40 to 80 client machines
• Run XDCR monitoring traffic in parallel, collect latency to remote DCs
©2016 PayPal Inc. Confidential and proprietary. 20
Benchmark Data Collection
• YCSB Summarization Data. Sample from one client:
• XCDR Latency Data
• Insert one doc to local DC, then query same doc from remote DCs in parallel, record insert ack time, query success time, network round trip time
[OVERALL], Throughput(ops/sec), 9294.932867347366[INSERT], Operations, 4000000.0[INSERT], AverageLatency(us), 203.5180525[INSERT], MinLatency(us), 92.0[INSERT], MaxLatency(us), 80959.0[INSERT], 95thPercentileLatency(us), 429.0[INSERT], 99thPercentileLatency(us), 959.0
©2016 PayPal Inc. Confidential and proprietary. 21
Benchmark Data Collection
• Couchbase Performance Data
* cpu_utilization_rate* curr_connections* ops* cmd_get* cmd_set* ep_cache_miss_rate* vb_active_resident_items_ratio* ep_bg_fetched* replication_changes_left* xdc_ops* ep_dcp_replica_items_remaining* ep_dcp_replica_items_sent* ep_dcp_xdcr_items_remaining* ep_dcp_xdcr_items_sent
©2016 PayPal Inc. Confidential and proprietary. 22
Benchmark Data Processing
• Sample Data Window
• A 2 min sample window in the middle of a minimal 5 min test period is chosen to represent the steady state of test
• Data Aggregation• Aggregatable Metrics: Use sum of all nodes
• Non-aggregatable Metrics: Use average of all nodes
• Data Graphing• Use standard pandas/matplotlib python library
©2016 PayPal Inc. Confidential and proprietary. 23
Benchmark Results - 100% Write To 4 Active DCs
©2016 PayPal Inc. Confidential and proprietary. 24
100% Write Workload Throughput
• Client side errors happen well before reaching max throughput• Max throughput = 340k/sec without errors
©2016 PayPal Inc. Confidential and proprietary. 25
100% Write Workload P99/Max Latency
• Sub-millisecond level P99 insert latency
©2016 PayPal Inc. Confidential and proprietary. 26
100% Write Workload XDCR Latency (20 Client Threads)
• Network latency driven XDCR latency at light throughput (100K/sec)
• Same region: 4 ms• Distant region: 32 ms
©2016 PayPal Inc. Confidential and proprietary. 27
100% Write Workload XDCR Latency (40 Client Threads)
• With increased throughput demand, distant DC is impacted first.
©2016 PayPal Inc. Confidential and proprietary. 28
100% Write Workload XDCR Latency (80 Client Threads)
• At max throughput, ALL DCs are impacted.
©2016 PayPal Inc. Confidential and proprietary. 29
100% Write Workload XDCR Backlog (ep_dcp_xdcr_items_remaining)
• XDCR backlog happens before reaching max throughput and client side errors
• Memory/Nozzle tuning can impact the XDCR performance
©2016 PayPal Inc. Confidential and proprietary. 30
Benchmark Results - 95% Read + 5% Write To 4 Active DCs
©2016 PayPal Inc. Confidential and proprietary. 31
95% Read Workload Throughput
• A much higher throughput than 100% write use case
©2016 PayPal Inc. Confidential and proprietary. 32
95% Read Workload P99 Latency
• Good Read/Write latency at millisecond level
©2016 PayPal Inc. Confidential and proprietary. 33
95% Read Workload Max Latency
• Better Max Latency before reaching 2 mil/sec throughput
©2016 PayPal Inc. Confidential and proprietary. 34
95% Read Workload XDCR Latency (80 Client Threads)
• Network latency driven XDCR latency at light throughput (100K/sec)
• Same region: 4 ms• Distant region: 32 ms
©2016 PayPal Inc. Confidential and proprietary. 35
95% Read Workload XDCR Latency (800 Client Threads)
• Network latency driven XDCR latency at high throughput (4mil/sec)
• Same region: < 10 ms• Distant region: 30 – 50 ms
©2016 PayPal Inc. Confidential and proprietary. 36
95% Read Workload XDCR Latency (1120 Client Threads)
• Network latency driven XDCR latency at max throughput (5mil/sec)
• Same region: < 20 ms• Distant region: 400 – 600 ms
©2016 PayPal Inc. Confidential and proprietary. 37
95% Read Workload XDCR Backlog(ep_dcp_xdcr_items_remaining)
• Backlog happens at much higher overall read/write throughput than the 100% write use case
• The more remote DCs for XDCRs, the more backlog
©2016 PayPal Inc. Confidential and proprietary. 38
Comparison Among Different Traffic Patterns ( 4DC-4A vs 4DC-2A, vs 4DC-1A)
©2016 PayPal Inc. Confidential and proprietary. 39
YCSB Throughput vs Traffic Pattern• Limited scalability by Active-Active vs Active-Passive:
Workload 4DC-1A(4 Clusters in 4 DC, 1 Active)
4DC-2A(4 Clusters in 4 DC, 2 Active)
4DC-4A(4 Clusters in 4 DC, 4 Active)
4DC-4A/4DC-1A
100% write 200 K/sec 200 K/sec 200 K/sec 1 x
95% read + 5% write
2.5 Mil/sec 3 Mil/sec 4.5 Mil/sec 1.8 x
200K/sec client facing traffic è 200K * (4 + 1) = 1 M/sec total KV traffic in a 4 cluster setup
©2016 PayPal Inc. Confidential and proprietary. 40
Application Latency vs Traffic Pattern
• Consistent Latency:
Workload 4DC-1A(4 Clusters in 4 DC, 1 Active)
4DC-2A(4 Clusters in 4 DC, 2 Active)
4DC-4A(4 Clusters in 4 DC, 4 Active)
4DC-4A/4DC-1A
100% write Avg: 0.25 ms Avg: 0.2 ms Avg: 0.25 ms 1 x
P99: 1 ms P99: 1 ms P99: 1 ms 1x
95% read + 5% write
Avg: 0.3 ms Avg: 0.2 ms Avg: 0.2 ms 0.7 x
P99: 1.2 ms P99: 1.1 ms P99: 1.3 ms 1x
• Millisecond (P99) or sub millisecond (Avg) latency • All traffic patterns• Both reat intensive and write intensive use cases
©2016 PayPal Inc. Confidential and proprietary. 41
Data Replication Latency vs Traffic Pattern
Workload 4DC-1A(4 Clusters in 4 DC, 1 Active)
4DC-2A(4 Clusters in 4 DC, 2 Active)
4DC-4A(4 Clusters in 4 DC, 4 Active)
100% write Close: 10 ms (avg) Close: 10 ms (avg) Close: 10 ms (avg)
Far: 100 ms (avg) Far: 200 ms (avg) Far: 250 ms (avg)
95% read + 5% write Close: 5 ms (avg) Close: 5 ms (avg) Close: 10 ms (avg)
Far: 20 ms (avg) Far: 20 ms (avg) Far: 100 ms (avg)
• Amplification by Network Latency, Geographic Distance Effect• Write Intensive (higher latency) vs Read Intensive (lower latency)• Active-Active (higher latency ) vs Active-Passive (lower latency)
©2016 PayPal Inc. Confidential and proprietary. 42
Tuning Throughput Findings
• Couchbase 2.0 binding in YCSB 0.10.0
• Nozzle Increase (effect subject to memory allocation)sourceNozzlePerNode=4 (default 2)targetNozzlePerNode=4 (default 2)
• Batch sizeworkerBatchSize=2000 (default 500)docBatchSizeKb=4096 (default 2048)
• Optimization thresholdoptimisticReplicationThreshold=10240 (default 256)
©2016 PayPal Inc. Confidential and proprietary. 43
Nozzle Increase
100% Write Workload 4 Nozzles 2 Nozzles
Max Thread# without Insert Errors 80 80
Avg Throughput 325 K/sec 330 K/sec
Avg Latency 0.2 ms 0.2 ms
XDCR Latency
©2016 PayPal Inc. Confidential and proprietary. 44
Other Considerations for A Higher XDCR Throughput
• Increasing bucket RAM allocation to release memory pressure when reaching high water marks.
• Using a faster disk subsystem.
• Upgrade to Couchbase 4.5.x with DCP cursor enhancements.
©2016 PayPal Inc. Confidential and proprietary. 45
Summary
• Couchbase multiple data center Active-Active provides higher availability and scalability. This deployment pattern is used in production at PayPal.
• Scalability is dependent on specific use cases. Read intensive use cases scales better (1.8x in 4 DCs) than write intensive use cases.
• XDCR latency is largely affected by the network latency. XDCR latency can be much higher than actual network latency.
• Geographically close DCs tend to have more consistent data than remote DCs.
• XDCR and overall throughput max out independently. Very likely, before a cluster reaches the max throughput, XDCR is already lagging.
Thank You!
46
47
Share your opinion on Couchbase1. Go here: http://gtnr.it/2eRxYWn
2. Create a profile
3. Provide feedback (~15 minutes)
48
The Couchbase Connect16mobile appTake our in-app survey!