Cassandra 101

Post on 11-May-2015

1,857 views 0 download

Tags:

description

Introduction to Cassandra (Operational)

transcript

Cassandra 101Introduction to Apache Cassandra

What is Cassandra?● A distributed, columnar database● Data model inspired by Google BigTable (2006)● Distribution model inspired by Amazon Dynamo (2007)● Open Sourced by Facebook in 2008● Monolithic Kernel written in Java● Used by Digg, Facebook, Twitter, Reddit, Rackspace,

CloudKick and others

Etymology● In Greek mythology Cassandra (Also known as Alexandra) was

the daughter of King Priam and Queen Hecuba of Troy● Her beauty caused Apollo to grant her the gift of prophecy● When she did not return his love, Apollo placed a curse on her

so that no one would ever believe her predictions

Why Cassandra ?

● Minimal Administration● No Single Point of Failure● Scale Horizontally● Writes are durable● Optimized for writes● Consistency is flexible, can be updated

online● Schema is flexible, can be updated online● Handles failure gracefully● Replication is easy, Rack and DC aware

Commercial Support

Data Model

A Column is the basic unit consisting Key, Value and Timestamp

Data Model

A Column is the basic unit consisting Key, Value and Timestamp

RDBMS vs Cassandra

Map<RowKey, SortedMap<ColumnKey, ColumnValue>>

Cassandra is good at

Reading data from a row in the order it is stored, i.e. by Column Name!

Understand the queries you application requires before building the data model

Consistent HashingLoad Balancing in a changing world ...

● Evenly map keys to nodes● Minimize key movement when

nodes join or leave

The Partitioner:

● RandomPartitioner transforms Keys to Tokens using MD5

● In C* 1.2 the default hashing is Murmur3 algorithm

Keys and Tokens?

0 999010

‘fop’ ‘foo’

MD5 hashing for ‘fop’ is 89de73aaae8c956fb7c9379be7978e5b

MD5 hashing for ‘foo’ is d3b07384d113edec49eaa6238ad5ff00

Token Ring.

99 0

‘fop’ token: 10‘foo’

token: 90

Token Ranges (Pre 1.2)

Node 1token:0

76-0 1-25

26-5051-75

Node 2token:25

Node 3token:50

Node 4token:75

‘foo’ token 90

Token Ranges With Virtual Nodes in 1.2

Node 1

Node 2

Node 3

● Easier to Enlarge or shrink the cluster

● The cluster can grow in steps of 1 node

● Node Recovery is much more faster

Replication Strategy

Node 1token:0

76-0 1-25

26-5051-75

Node 2token:25

Node 3token:50

Node 4token:75

‘foo’ token 90

Selects Replication Factor number of nodes for a row.

Replication Strategy

Node 1token:0

76-0 1-25

26-5051-75

Node 2token:25

Node 3token:50

Node 4token:75

‘foo’ token 90

SimpleStrategy with RF 3

Replication Strategy

Node 1token:0

76-0 1-25

26-5051-75

Node 2token:25

Node 3token:50

Node 4token:75

‘foo’ token 90

NetworkTopolgyStrategy Uses Replication Factor per Data Center

Node 1token:0

76-0 1-25

26-5051-75

Node 2token:25

Node 3token:50

Node 4token:75

‘foo’ token 90

EAST WEST

SimpleSnitch

Places all nodes in the same DC & RACK (Default)

EC2Snitch/EC2MultiRegionSnitch

DC is set to AWS Region and a Rack to Availability Zone

PropertyFileSnitch

Nodes DC and Racks are maintained in a property file

GossipPropertyFileSnitch

Uses GOSSIP as first source for node info and if not available it uses the property file

The Client and the Coordinator

Node 1

Node 3

Node 4

Node 2

‘foo’ token 90

Client

Multi DC Client and Coordinator

Node 1

Node 3

Node 4

Node 2

‘foo’ token 90

Client

Node 10

Node 20

GossipNodes share information with small number of neighbours, who share information with other small number of neighbours …● Used for intra-cluster

communication● Routes client requests● Detects nodes failure ● Peers are called by seeds in

config file.

Cassandra Objects

● CommitLog● MemTable● SSTable● Index● Bloom Filter

Consistency● CAP theorem

○ Trade consistency for availability○ Consistency is a choice

* it doesn't matter if you are good at somethings long as you are consistent.

Partition

Consistency

Availability

OR

Level Description

ZERO Cross fingers

ANY 1st to Respond (HH)

ONE, TWO, THREE 1st to Respond

QUORUM N/2+1 replicas

ALL All replicas

WRITELevel Description

ZERO N/A

ANY N/A

ONE, TWO, THREE nth to Respond

QUORUM* N/2+1

ALL All replicas

READ

Consistency Level

● Specifies for each request● Number of nodes to wait for

* QUORUM, LOCAL_QUORUM, EACH_QUOROM

Write ‘foo’ at Quorum with Hinted Handoff

Node 1

Node 3 is Down

Node 4 holds ‘foo’ for node 3

Node 2

‘foo’ token 90

Client

Read ‘foo’ at Quorum

Node 1

Node 3 is Down

Node 4 holds ‘foo’ for node 3

Node 2

‘foo’ token 90

Client

Are used to resolve differences● Stored for each Column Value● 64bit Integers

Column Node 1 Node 2 Node 3

Vegetable ‘cucumber’ (timestamp 10)

‘cucumber’ (timestamp 10)

<missing>

Fruit ‘Apple’(timestamp 10)

‘banana’(timestamp 15)

‘Apple’(timestamp 10)

Column TimeStamps

Strong Consistency

W + R > N

#Write Nodes + #Read Nodes > Replication Factor

● QUORUM Read + QUORUM Write● ALL Read + ONE Write● ONE Read + ALL Write

Achieving Consistency

● Consistency Level● Hinted Handoff● Read Repair● Anti Entropy (User triggered Repairs)

Write Path

● Append to Commit Log File● Merge Columns into Memtable● Asynchronously flush Memtabe to a

new file (Never update existing files)● Data is stored in immutable files called

SSTables (Sorted String Tables)

SSTables Files

*-Data.db*-Index.db*-Filter.db

(And others)

Read Path

Bloom Filter (cache)

Index/Key Cache

Memory

SStable-1.Data.dbfoo:fruit (ts:10)

applevegetable (ts:15)

cucumber….….….

SSTable-1-Index.db

Disk

Bloom Filter (cache)

Index/Key Cache

SStable-2.Data.dbfoo:fruit (ts:10)

applevegetable (ts:10)

Pepper….….….

SSTable-2-Index.db

Bloom Filter Bloom Filter

CompactionsCompactions merges truth from multiple SSTables into one SSTable with the same

truth

(Manual and continuous background process)

Column SSTable 1 SStable 2 New

Vegetable ‘cucumber’ (timestamp 10)

‘cucumber’ (timestamp 10)

‘cucumber’ (timestamp 10)

Fruit ‘Apple’(timestamp 10)

<tombstone>(timestamp 15)

<tombstone>(timestamp: 15)

Writes and Reads

Managing Cassandra

● Single configuration file /etc/cassandra/cassandra.yaml file

● Single control command /usr/bin/nodetool

● Monitoring done by DataStax OpsCenter

Troubleshooting CassandraAlways inspect these files:

● /var/log/cassandra/cassandra.log (Startup)● /var/log/cassandra/system.log (Normal work)

Backup

Use Cassandra snapshots...

And God said to Noah, Noah make me a backup ... 'cause I shall format

Client (API) Choices● Thrift, original and still fully supported API:

○ JAVA: Thrift, Hector, Astyanax, DataStax Driver, Cundera…○ Python: Pycassa, Telephus, …○ Ruby: Fauna○ PHP: PHP Client Library○ C#○ Node.JS○ GO○ SImba ODBC○ C++: LibQtCassandra○ ORM○ ….

● CQL3: A Table oriented, Schema Driven, Data Model and Similar to SQL

CQL3 Create KeySpace

● Using CQL3 via cqlsh command tool ($CASSANDRA_HOME/bin/cqlsh):● Create a new Keyspace with Replication factor of 3 and NetworkTopology

CREATE KEYSPACEkenshoo_cass_fans

WITH replication = {‘class’:’NetworkTopologyStrategy’, ‘us_east_dc’:3};

CQL3 Working with Tables● CQL3 Example● Table is a sparse collection of well known ordered columns

CREATE TABLE User(

user_name text,password text,real_name text,PRIMARY KEY (user_name)

);---------------------------------------------------------INSERT INTO User

(user_name, password, real_name)VALUES

(‘nader’,’sekr8t’,’MR NADER’);---------------------------------------------------------

SELECT * From User where user_name = ‘NADER’;

user_name| password | real_name---------+----------+-----------

nader| sekr8t | MR NADER