Cross Datacenter Replication in Apache Solr 6

Cross Datacenter Replication in Apache Solr 6

Shalin Shekhar Mangar Lucidworks Inc. @shalinmangar

The standard for enterprise search.

of Fortune 500 uses Solr.

90%

Agenda

• Review a typical Solr deployment architecture

• Challenges of running a Solr deployment across data centers

• Cross Data Centre Replication (CDCR) in Solr

• Setup and configuration

• Limitations

• Alternative strategies

• Future work

Client ClientClient

Solr

Zookeeper

Datacenter

CDCR Anti-patterns - Remote Solr instances

C

Solr

Zookeeper

DC 1

C C

DC 2

C C C

CDCR Anti-patterns - Remote ZK and Solr

C

Solr

Zookeeper

DC 1

C C

DC 2

C C C

CDCR Anti-patterns - Remote ZK and Solr

C

Solr

Zookeeper

DC 1

C C

DC 2

C C C

DC 3

Why not a single Solr Cloud?

• Same update is transferred to each replica

• Synchronous indexing means burst-indexing is constrained by cross DC bandwidth

• Increased latency for indexing operations

• Need a ZooKeeper node in a 3rd DC to break ties

• Search requests are not DC-aware, may choose a remote replica

Cross Datacenter Replication in Solr

• Let’s call it CDCR for short

• Accommodate two or more data centres

• Active/passive setup for disaster recovery

• Support limited bandwidth links

• Eventually consistent passive cluster

Source: http://yonik.com/solr-cross-data-center-replication/

CDCR in Solr 6

• Scalable: no SPoF and/or bottleneck

• Peer cluster can have a different replication factor

• Asynchronous updates; no penalty for indexing

• Push operations for low latency replication

• Low overhead — uses existing transaction logs and indexes

• Leader-to-leader communication ensures update is sent only once to peer cluster

Target Cluster

Tune replication

Synchronize logs

CdcrUpdateLog

Enable APIs

Update chains

Update chains

Update log

CDCR APIs

• http://host:port/solr/collection_name/cdcr?action=START

• Control APIs: START, STOP, ENABLEBUFFER, DISABLEBUFFER, STATUS

• Monitoring APIs: QUEUES, OPS, ERRORS

How to failover?

• Change configuration on target to make it the source

• Point indexers to the new target

• Change configuration on source to make it the new target

• May require stopping indexing during the conversion process — especially if you want to revert the change

CDCR support in Solr 6+

• Active/passive setup either for disaster recovery or for low latency querying

• Solr clusters with existing data can be converted to a source cluster from Solr 6.2 onwards

• Low to medium indexing traffic

CDCR Limitations and gotchas

• By default CDCR is disabled — invoke START to enable on both source and target

• Soft commits are not replicated to target — must schedule autoSoftCommit explicitly on target

• Different set of configurations required on source and target

• Daisy-chaining is possible but not well tested — add all targets to the same source cluster

CDCR Limitations and gotchas

• Not suitable for applications requiring high throughput indexing — some knobs exist for tuning replication speeds

• Update log buffers can grow indefinitely when target clusters are down — can work around by disabling buffering for the time being if there is only one target

• No automatic failover between source and target — explicit actions required to modify configurations and point indexing pipelines to the new source

• No Active/active setup

Alternative strategy

• Use a proper queue such as Apache Kafka to feed source and target DCs simultaneously

• Use external versions in conjunction with versions generated by Solr — DocBasedVersionConstraintsProcessorFactory

• Watch the video for “Solr Cross-Datacenter Replication and Consistency at Scale” by Oliver Bates, Apple Inc. — http://sched.co/8ArU

• Pros: Supports high indexing throughputs and active/active replication

• Cons: Additional systems required, managing consistency is difficult and requires in depth Solr expertise, all atomic updates must go to a single DC, cannot support delete-by-query

http://sched.co/8ArU

Problems we solved

• Synchronous indexing to replicas — build separate asynchronous indexing pipeline

• Limited size of the update log — use update log as the queue

• How to track replication progress to preserve consistency on target clusters in case the source leader dies — checkpoints

• Bootstrapping target cluster with indexes when update logs are incomplete

• New replicas on source have no logs to replicate — replicate update logs during recovery

Future work

• Move configuration out of solrconfig.xml and into API calls

• Dynamically add/remove/change target cluster information

• Cap update log to a max size and fall back to index replication if necessary

• Refactor and combine CdcrUpdateLog

• Better monitoring: capture transfer rate and latency info

• Add support for rate limiting replication between source and target

• Active/active?

Resources

• CDCR page on ref guide — https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462

• http://yonik.com/solr-cross-data-center-replication/

• https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents

Thank you! [email protected]

mailto:[email protected]

Date post:	07-Jan-2017
Category:	Software
Upload:	shalin-shekhar-mangar
View:	220 times
Download:	6 times

Cross Datacenter Replication in Apache Solr 6

Software