Göteborg Distributed: Eventual Consistency in Apache Cassandra

transcript

Jeremy Hanna Support Engineer

Eventual Consistency in Apache Cassandra

Cassandra Design•Massive scalability •High Performance •Reliability/Availability •Ease of use

Developer friendly•CQL3 •Collections (List, Map, Set) •User defined types (2.1) •Cassandra native drivers •Native paging •Tracing •DataStax DevCenter tool •Atomic batches •Lightweight transactions •Triggers

CQL3 examples

CREATE KEYSPACE shire WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'eu' : 3, 'us-east' : 2};

SELECT * FROM emp WHERE empID IN (130,104) ORDER BY deptID DESC;

INSERT INTO excelsior.clicks (userid, url, date, name) VALUES ( 3715e600-2eb0-11e2-81c1-0800200c9a66, ‘http://cassandra.apache.org', ‘2013-10-09', ‘Mary') USING TTL 86400;

UPDATE users SET email = ‘charlie@wonka.com’ WHERE login = ‘cbucket64'IF email = ‘cbucket@wonka.com’

CREATE USER bombadil WITH PASSWORD 'goldberry4ever' SUPERUSER;

GRANT ALTER ON KEYSPACE shire TO gandalf;

Ops Friendly•Simple design

•no special role, no single point of failure

•Lots of exposed metrics via JMX •Nodes and entire datacenters can go down with no loss of service •Rapid read protection •DataStax OpsCenter

•Visual monitoring tool •REST interface to metric data •Free version •Hands-off services

Some C* Users

Cassandra Design•Massive scalability

•Multi-datacenter

•High Performance •Reliability/Availability

•no SPOF, no special roles

•Ease of Use

Fully Distributed•Distributed systems introduce complex problems •What is “down”?

•Individual server is down •Network link is down •Long server pause (e.g. GC pause) •Variable network latency

•What do I do when a server is overloaded? •How can I stay available/reliable in such circumstances? •How can I maintain consistency? •How do I reconcile differences?

CAP Theorem•Select two

Consistency

Availability

Partition Tolerance

Eventual Consistency•Individual server durability

•Write to commitlog (batch or periodic sync) •Write to memtable (which gets flushed to disk)

•Achieving consistency level •ONE, QUORUM, ALL •LOCAL_ONE, LOCAL_QUORUM •ANY, EACH_QUORUM (for writes)

•Important to note: •All replicas always get a copy of the write

Stuff happens•Overloaded node •“Down” node(s) •Network partition •Datacenter down •Outcome: inconsistency among replicas

Continually cleaning•Hinted handoff

•valid for a window of time •replays back to node restored to service

•Read repair •after a read, check that data for agreement (digest) •read_repair_chance defaults to 0.1 •also dclocal_read_repair_chance

•Anti-entropy service (manual repair) •Check for agreement for all data for range A-B •Run manual repair every gc grace seconds

Advanced Repair•Manual repairs have limited resolution

•“There is something different in these 1000 rows” •Therefore you have to stream all 1000 rows •Leads to overstreaming, waste

•You can specify start/end keys •Get row level precision •More complicated to execute •DataStax has a repair service to help

Safely consistent?•(LOCAL_)QUORUM reads/writes to be safe? •Ultimately depends on your requirements •Theoretical versus empirical

Netflix Study•Two datacenters (US-East and US-West) •Wrote 500,000 records in each datacenter •50k write operations per second in each DC •Wrote at consistency level ONE •All data read back correctly in other DC •Tried 5 different runs, introduced failures along the way

See planetcassandra.org/blog/post/a-netflix-experiment-eventual-consistency-hopeful-consistency-by-christos-kalantzis/

Practical Consistency•ONE is not suitable for all cases •Review your requirements, SLA •Do your own testing to get comfortable •Flexibility translates into the best performance for your use case

Questions?

Göteborg Distributed: Eventual Consistency in Apache Cassandra

Software