Date post: | 07-Jan-2017 |
Category: |
Software |
Upload: | shalin-shekhar-mangar |
View: | 220 times |
Download: | 6 times |
Cross Datacenter Replication in Apache Solr 6
Shalin Shekhar Mangar Lucidworks Inc. @shalinmangar
The standard for enterprise search.
of Fortune 500 uses Solr.
90%
Agenda
• Review a typical Solr deployment architecture
• Challenges of running a Solr deployment across data centers
• Cross Data Centre Replication (CDCR) in Solr
• Setup and configuration
• Limitations
• Alternative strategies
• Future work
Client ClientClient
Solr
Zookeeper
Datacenter
CDCR Anti-patterns - Remote Solr instances
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C
CDCR Anti-patterns - Remote ZK and Solr
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C
CDCR Anti-patterns - Remote ZK and Solr
C
Solr
Zookeeper
DC 1
C C
DC 2
C C C
DC 3
Why not a single Solr Cloud?
• Same update is transferred to each replica
• Synchronous indexing means burst-indexing is constrained by cross DC bandwidth
• Increased latency for indexing operations
• Need a ZooKeeper node in a 3rd DC to break ties
• Search requests are not DC-aware, may choose a remote replica
Cross Datacenter Replication in Solr
• Let’s call it CDCR for short
• Accommodate two or more data centres
• Active/passive setup for disaster recovery
• Support limited bandwidth links
• Eventually consistent passive cluster
Source: http://yonik.com/solr-cross-data-center-replication/
CDCR in Solr 6
• Scalable: no SPoF and/or bottleneck
• Peer cluster can have a different replication factor
• Asynchronous updates; no penalty for indexing
• Push operations for low latency replication
• Low overhead — uses existing transaction logs and indexes
• Leader-to-leader communication ensures update is sent only once to peer cluster
Target Cluster
Tune replication
Synchronize logs
CdcrUpdateLog
Enable APIs
Update chains
Update chains
Update log
CDCR APIs
• http://host:port/solr/collection_name/cdcr?action=START
• Control APIs: START, STOP, ENABLEBUFFER, DISABLEBUFFER, STATUS
• Monitoring APIs: QUEUES, OPS, ERRORS
How to failover?
• Change configuration on target to make it the source
• Point indexers to the new target
• Change configuration on source to make it the new target
• May require stopping indexing during the conversion process — especially if you want to revert the change
CDCR support in Solr 6+
• Active/passive setup either for disaster recovery or for low latency querying
• Solr clusters with existing data can be converted to a source cluster from Solr 6.2 onwards
• Low to medium indexing traffic
CDCR Limitations and gotchas
• By default CDCR is disabled — invoke START to enable on both source and target
• Soft commits are not replicated to target — must schedule autoSoftCommit explicitly on target
• Different set of configurations required on source and target
• Daisy-chaining is possible but not well tested — add all targets to the same source cluster
CDCR Limitations and gotchas
• Not suitable for applications requiring high throughput indexing — some knobs exist for tuning replication speeds
• Update log buffers can grow indefinitely when target clusters are down — can work around by disabling buffering for the time being if there is only one target
• No automatic failover between source and target — explicit actions required to modify configurations and point indexing pipelines to the new source
• No Active/active setup
Alternative strategy
• Use a proper queue such as Apache Kafka to feed source and target DCs simultaneously
• Use external versions in conjunction with versions generated by Solr — DocBasedVersionConstraintsProcessorFactory
• Watch the video for “Solr Cross-Datacenter Replication and Consistency at Scale” by Oliver Bates, Apple Inc. — http://sched.co/8ArU
• Pros: Supports high indexing throughputs and active/active replication
• Cons: Additional systems required, managing consistency is difficult and requires in depth Solr expertise, all atomic updates must go to a single DC, cannot support delete-by-query
Problems we solved
• Synchronous indexing to replicas — build separate asynchronous indexing pipeline
• Limited size of the update log — use update log as the queue
• How to track replication progress to preserve consistency on target clusters in case the source leader dies — checkpoints
• Bootstrapping target cluster with indexes when update logs are incomplete
• New replicas on source have no logs to replicate — replicate update logs during recovery
Future work
• Move configuration out of solrconfig.xml and into API calls
• Dynamically add/remove/change target cluster information
• Cap update log to a max size and fall back to index replication if necessary
• Refactor and combine CdcrUpdateLog
• Better monitoring: capture transfer rate and latency info
• Add support for rate limiting replication between source and target
• Active/active?
Resources
• CDCR page on ref guide — https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462
• http://yonik.com/solr-cross-data-center-replication/
• https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents