+ All Categories
Home > Technology > Cassandra 101

Cassandra 101

Date post: 11-May-2015
Category:
Upload: nader-ganayem
View: 1,857 times
Download: 0 times
Share this document with a friend
Description:
Introduction to Cassandra (Operational)
Popular Tags:
44
Cassandra 101 Introduction to Apache Cassandra
Transcript
Page 1: Cassandra 101

Cassandra 101Introduction to Apache Cassandra

Page 2: Cassandra 101

What is Cassandra?● A distributed, columnar database● Data model inspired by Google BigTable (2006)● Distribution model inspired by Amazon Dynamo (2007)● Open Sourced by Facebook in 2008● Monolithic Kernel written in Java● Used by Digg, Facebook, Twitter, Reddit, Rackspace,

CloudKick and others

Page 3: Cassandra 101

Etymology● In Greek mythology Cassandra (Also known as Alexandra) was

the daughter of King Priam and Queen Hecuba of Troy● Her beauty caused Apollo to grant her the gift of prophecy● When she did not return his love, Apollo placed a curse on her

so that no one would ever believe her predictions

Page 4: Cassandra 101

Why Cassandra ?

● Minimal Administration● No Single Point of Failure● Scale Horizontally● Writes are durable● Optimized for writes● Consistency is flexible, can be updated

online● Schema is flexible, can be updated online● Handles failure gracefully● Replication is easy, Rack and DC aware

Page 5: Cassandra 101

Commercial Support

Page 6: Cassandra 101

Data Model

A Column is the basic unit consisting Key, Value and Timestamp

Page 7: Cassandra 101

Data Model

A Column is the basic unit consisting Key, Value and Timestamp

Page 8: Cassandra 101

RDBMS vs Cassandra

Map<RowKey, SortedMap<ColumnKey, ColumnValue>>

Page 9: Cassandra 101

Cassandra is good at

Reading data from a row in the order it is stored, i.e. by Column Name!

Understand the queries you application requires before building the data model

Page 10: Cassandra 101

Consistent HashingLoad Balancing in a changing world ...

● Evenly map keys to nodes● Minimize key movement when

nodes join or leave

Page 11: Cassandra 101

The Partitioner:

● RandomPartitioner transforms Keys to Tokens using MD5

● In C* 1.2 the default hashing is Murmur3 algorithm

Page 12: Cassandra 101

Keys and Tokens?

0 999010

‘fop’ ‘foo’

MD5 hashing for ‘fop’ is 89de73aaae8c956fb7c9379be7978e5b

MD5 hashing for ‘foo’ is d3b07384d113edec49eaa6238ad5ff00

Page 13: Cassandra 101

Token Ring.

99 0

‘fop’ token: 10‘foo’

token: 90

Page 14: Cassandra 101

Token Ranges (Pre 1.2)

Node 1token:0

76-0 1-25

26-5051-75

Node 2token:25

Node 3token:50

Node 4token:75

‘foo’ token 90

Page 15: Cassandra 101

Token Ranges With Virtual Nodes in 1.2

Node 1

Node 2

Node 3

● Easier to Enlarge or shrink the cluster

● The cluster can grow in steps of 1 node

● Node Recovery is much more faster

Page 16: Cassandra 101

Replication Strategy

Node 1token:0

76-0 1-25

26-5051-75

Node 2token:25

Node 3token:50

Node 4token:75

‘foo’ token 90

Selects Replication Factor number of nodes for a row.

Page 17: Cassandra 101

Replication Strategy

Node 1token:0

76-0 1-25

26-5051-75

Node 2token:25

Node 3token:50

Node 4token:75

‘foo’ token 90

SimpleStrategy with RF 3

Page 18: Cassandra 101

Replication Strategy

Node 1token:0

76-0 1-25

26-5051-75

Node 2token:25

Node 3token:50

Node 4token:75

‘foo’ token 90

NetworkTopolgyStrategy Uses Replication Factor per Data Center

Node 1token:0

76-0 1-25

26-5051-75

Node 2token:25

Node 3token:50

Node 4token:75

‘foo’ token 90

EAST WEST

Page 19: Cassandra 101

SimpleSnitch

Places all nodes in the same DC & RACK (Default)

Page 20: Cassandra 101

EC2Snitch/EC2MultiRegionSnitch

DC is set to AWS Region and a Rack to Availability Zone

Page 21: Cassandra 101

PropertyFileSnitch

Nodes DC and Racks are maintained in a property file

Page 22: Cassandra 101

GossipPropertyFileSnitch

Uses GOSSIP as first source for node info and if not available it uses the property file

Page 23: Cassandra 101

The Client and the Coordinator

Node 1

Node 3

Node 4

Node 2

‘foo’ token 90

Client

Page 24: Cassandra 101

Multi DC Client and Coordinator

Node 1

Node 3

Node 4

Node 2

‘foo’ token 90

Client

Node 10

Node 20

Page 25: Cassandra 101

GossipNodes share information with small number of neighbours, who share information with other small number of neighbours …● Used for intra-cluster

communication● Routes client requests● Detects nodes failure ● Peers are called by seeds in

config file.

Page 26: Cassandra 101

Cassandra Objects

● CommitLog● MemTable● SSTable● Index● Bloom Filter

Page 27: Cassandra 101

Consistency● CAP theorem

○ Trade consistency for availability○ Consistency is a choice

* it doesn't matter if you are good at somethings long as you are consistent.

Partition

Consistency

Availability

OR

Page 28: Cassandra 101

Level Description

ZERO Cross fingers

ANY 1st to Respond (HH)

ONE, TWO, THREE 1st to Respond

QUORUM N/2+1 replicas

ALL All replicas

WRITELevel Description

ZERO N/A

ANY N/A

ONE, TWO, THREE nth to Respond

QUORUM* N/2+1

ALL All replicas

READ

Consistency Level

● Specifies for each request● Number of nodes to wait for

* QUORUM, LOCAL_QUORUM, EACH_QUOROM

Page 29: Cassandra 101

Write ‘foo’ at Quorum with Hinted Handoff

Node 1

Node 3 is Down

Node 4 holds ‘foo’ for node 3

Node 2

‘foo’ token 90

Client

Page 30: Cassandra 101

Read ‘foo’ at Quorum

Node 1

Node 3 is Down

Node 4 holds ‘foo’ for node 3

Node 2

‘foo’ token 90

Client

Page 31: Cassandra 101

Are used to resolve differences● Stored for each Column Value● 64bit Integers

Column Node 1 Node 2 Node 3

Vegetable ‘cucumber’ (timestamp 10)

‘cucumber’ (timestamp 10)

<missing>

Fruit ‘Apple’(timestamp 10)

‘banana’(timestamp 15)

‘Apple’(timestamp 10)

Column TimeStamps

Page 32: Cassandra 101

Strong Consistency

W + R > N

#Write Nodes + #Read Nodes > Replication Factor

● QUORUM Read + QUORUM Write● ALL Read + ONE Write● ONE Read + ALL Write

Page 33: Cassandra 101

Achieving Consistency

● Consistency Level● Hinted Handoff● Read Repair● Anti Entropy (User triggered Repairs)

Page 34: Cassandra 101

Write Path

● Append to Commit Log File● Merge Columns into Memtable● Asynchronously flush Memtabe to a

new file (Never update existing files)● Data is stored in immutable files called

SSTables (Sorted String Tables)

Page 35: Cassandra 101

SSTables Files

*-Data.db*-Index.db*-Filter.db

(And others)

Page 36: Cassandra 101

Read Path

Bloom Filter (cache)

Index/Key Cache

Memory

SStable-1.Data.dbfoo:fruit (ts:10)

applevegetable (ts:15)

cucumber….….….

SSTable-1-Index.db

Disk

Bloom Filter (cache)

Index/Key Cache

SStable-2.Data.dbfoo:fruit (ts:10)

applevegetable (ts:10)

Pepper….….….

SSTable-2-Index.db

Bloom Filter Bloom Filter

Page 37: Cassandra 101

CompactionsCompactions merges truth from multiple SSTables into one SSTable with the same

truth

(Manual and continuous background process)

Column SSTable 1 SStable 2 New

Vegetable ‘cucumber’ (timestamp 10)

‘cucumber’ (timestamp 10)

‘cucumber’ (timestamp 10)

Fruit ‘Apple’(timestamp 10)

<tombstone>(timestamp 15)

<tombstone>(timestamp: 15)

Page 38: Cassandra 101

Writes and Reads

Page 39: Cassandra 101

Managing Cassandra

● Single configuration file /etc/cassandra/cassandra.yaml file

● Single control command /usr/bin/nodetool

● Monitoring done by DataStax OpsCenter

Page 40: Cassandra 101

Troubleshooting CassandraAlways inspect these files:

● /var/log/cassandra/cassandra.log (Startup)● /var/log/cassandra/system.log (Normal work)

Page 41: Cassandra 101

Backup

Use Cassandra snapshots...

And God said to Noah, Noah make me a backup ... 'cause I shall format

Page 42: Cassandra 101

Client (API) Choices● Thrift, original and still fully supported API:

○ JAVA: Thrift, Hector, Astyanax, DataStax Driver, Cundera…○ Python: Pycassa, Telephus, …○ Ruby: Fauna○ PHP: PHP Client Library○ C#○ Node.JS○ GO○ SImba ODBC○ C++: LibQtCassandra○ ORM○ ….

● CQL3: A Table oriented, Schema Driven, Data Model and Similar to SQL

Page 43: Cassandra 101

CQL3 Create KeySpace

● Using CQL3 via cqlsh command tool ($CASSANDRA_HOME/bin/cqlsh):● Create a new Keyspace with Replication factor of 3 and NetworkTopology

CREATE KEYSPACEkenshoo_cass_fans

WITH replication = {‘class’:’NetworkTopologyStrategy’, ‘us_east_dc’:3};

Page 44: Cassandra 101

CQL3 Working with Tables● CQL3 Example● Table is a sparse collection of well known ordered columns

CREATE TABLE User(

user_name text,password text,real_name text,PRIMARY KEY (user_name)

);---------------------------------------------------------INSERT INTO User

(user_name, password, real_name)VALUES

(‘nader’,’sekr8t’,’MR NADER’);---------------------------------------------------------

SELECT * From User where user_name = ‘NADER’;

user_name| password | real_name---------+----------+-----------

nader| sekr8t | MR NADER


Recommended