BHUVAN RAWAL
CASSANDRA - AN OVERVIEW
NOSQL-DATABASE.ORG
> MASSIVELY SCALABLE
> PARTITIONED ROW STORE
> MASTERLESS ARCHITECTURE
> LINEAR SCALABILITY
> NO SINGLE POINT OF FAILURE
> MULTIPLE DC SUPPORT OUT OF BOX
BHUVAN RAWAL
CASSANDRA - AN OVERVIEW
2008Open sourced by Facebook on Google Code, in
2009 became an Apache Incubator Project. In
2010 gained top level status at Apache.
Can be adapted for different
class of use cases
GENERAL PURPOSECan be available at the loss of
Node/Rack/DC
AVAILABLE
BHUVAN RAWAL
KEY FEATURES
CASSANDRA - AN OVERVIEW
Seamless distribution across
datacentres across continents
DISTRIBUTED
JVM Heap & GC Algorithms
Compaction Strategy
Key Cache Size
Row Cache
Compression Chunk Size
Speculative Retries
Throughput vs Latency tuning
KEY TUNABLES
BHUVAN RAWAL
CASSANDRA - AN OVERVIEW
Cassandra is the most popular wide column
store - Wikipedia
Deployed by 400+ Fortune-500 Firms
667 Companies Verified on siftery
Apple 100,000+ Node Deployment
Netflix - 95% Data on Cassandra
Uber - 20 Cassandra Clusters, soon will be 100
Spotify - 100+ Production Clusters
SOME USERS
BHUVAN RAWAL
CASSANDRA - AN OVERVIEW
Determines how data is to be stored in
nodes
Should be same across the cluster
Ordered Partitioner
Random Partitioner
Murmur3 Partitioner
PARTITIONER
BHUVAN RAWAL
CASSANDRA - AN OVERVIEW
Determines node placement
Allows to spread enough replicas to
handle failures
Failure Modes : Node -> Rack -> DC ->
Region
Tries its best to not have same replica in
same rack
SNITCH
BHUVAN RAWAL
CASSANDRA - AN OVERVIEW
status
health
tokens
schema version
data size
phi_threshold
GOSSIP PROTOCOL
BHUVAN RAWAL
CASSANDRA - AN OVERVIEW
As with most databases, data model is the key
to successful deployments & scalability
Test thoroughly on stage env
Avoid Client Side joins as far as possible
Materialized view - Boon for automated
denormalization
Tune Partition size to not affect cluster
abnormally
DATA MODEL
WWW.AUGUSTA&CO.COM
CASSANDRA - AN OVERVIEW
BHUVAN RAWAL
TEAM
CEO / Director
NANCY D. BROOKSHead Architect
RICHARD B. BEVERIDGEOperations Manager
JOHN V. POWELL
CASSANDRA - AN OVERVIEW
WWW.AUGUSTA&CO.COM
CASSANDRA - AN OVERVIEW
Datastax Driver for Spark:
-> Reads localized data off
Cassandra Nodes
-> Support for Hadoop
-> Pig, Hive, Squoop, Mahout
-> Solr integration
ANALYTICS SUPPORT
BHUVAN RAWA L
CASSANDRA - AN OVERVIEW
-> Memtable
-> SSTable - Sorted String
-> Index
-> Partition Summary
-> Bloom Filter
-> Compression
STORAGE