+ All Categories
Home > Technology > Introduction to Cassandra

Introduction to Cassandra

Date post: 15-Jan-2015
Category:
Upload: shimik
View: 1,390 times
Download: 2 times
Share this document with a friend
Description:
 
Popular Tags:
31
Introduction to Cassandra Shimi Kiviti @shimi_k
Transcript
Page 1: Introduction to Cassandra

Introduction toCassandra

Shimi Kiviti@shimi_k

Page 2: Introduction to Cassandra

Motivation

Scaling

How do you scale your database?● reads● writes

Page 3: Introduction to Cassandra
Page 4: Introduction to Cassandra

Influential Papers

● Bigtable: A distributed storage system for structured data, 2006

● Dynamo: amazon's highly available key-value store, 2007

Cassandra:● partition and replication - Dynamo● log structure column family - Bigtable

Page 5: Introduction to Cassandra

Cassandra Highlights

● Symmetric - all nodes are exactly the same○ No single point of failure○ Linearly scalable○ Ease of administration

● High availability with multiple datacenters● Consistency vs Latency● Read/Write anywhere● Flexible Schema● Column TTL● Distributed Counters

Page 6: Introduction to Cassandra

DHT - Distributed Hash Table

Page 7: Introduction to Cassandra

DHT

● O(1) node lookup● Explicit replication● Linear Scalability

Page 8: Introduction to Cassandra
Page 9: Introduction to Cassandra

Consistency

N = Replication factorR = Number of replicas to block when read <= NW = Number of replicas to block when write <= NQuorum = N/2 + 1

When W + R > N there is a full consistencyexamples:

● W = 1, R = N● W = N, R = 1● W = Quorum, R = Quorum

Page 10: Introduction to Cassandra

Consistency Level

● Every request defines consistency level○ Any○ One○ Two○ Three○ Quorum○ Local Quorum○ Each Quorum○ All

Page 11: Introduction to Cassandra

Data Model

● Keyspace ~ schema● ColumnFamilies ~ table● Rows● Columns

Page 12: Introduction to Cassandra

Column Family

Key1 Column Column Column

Key2 Column Column

Page 13: Introduction to Cassandra

Column Family

ColumnFamily: { TOK: { chen: 1, ronen: 7 } CityPath: { yuval: 5 }}

Page 14: Introduction to Cassandra

Super Column Family

ColumnFamily: { Key: { super1: { name: value, name: value } super2: { name: value } }}

KeyColumn Column ColumnSuper2

Column Column ColumnSuper1

Page 15: Introduction to Cassandra

Write

● Any node● Partitioner● Commit log, memtable ● Wait for W responses

Page 16: Introduction to Cassandra

Write

Page 17: Introduction to Cassandra

Write

● No reads● No seeks● Sequential disk access● Atomic within a column family● Fast● Always writeable (hinted hand-off)

Page 18: Introduction to Cassandra

Read

● Choose any node● Partitioner● Wait for R responses● tunable read repair in the background

Page 19: Introduction to Cassandra

Read

Read can be from multiple SSTablesSlower then writes

Page 20: Introduction to Cassandra

Cache

● There is no need to use memcached● There is an internal configurable cache

○ Key cache○ Row cache

Page 21: Introduction to Cassandra

Sorting

When you preform get the result is sorted● Rows are sorted according to the partitioner● Columns in a row are sorted according to the type of the

column name

Page 22: Introduction to Cassandra

Partitioner

● RandomPartitioner - Uses hash values as tokens. useful for distributing the load on all nodes.If you use it, set the nodes tokens manually

● OrderPreservePartioner - You can get sorted rows but it will cost you with an even cluster

Page 23: Introduction to Cassandra

Column Types

Available types:● Bytes● UTF8● Ascii● Long● Date● UUID● Composite - <Type1>:<Type2>

Page 24: Introduction to Cassandra

Column Types

Examples:

Sort1:8 109 vs 810 9

Sort2:dan:8 dan:10dan:10 vs dan:8shimi:1 shimi:1

Page 25: Introduction to Cassandra

Clients

● Thrift - Cassandra driver level interface● CQL - Cassandra query language (SQL like)● High level clients:

○ Python○ Java○ Scala○ Clojure○ .Net○ Ruby○ PHP○ Perl○ C++○ Haskel

Page 26: Introduction to Cassandra

Cascal - Scala client

Insert column:

session.insert("app" \ "users" \ "shimi" \ "passwd" \ "mypass")

val key = "app" \ "users" \ "shimi"session.insert(key \ "email" \ "shimi.k@...")

Get column value:

val pass = session.get(key \ "passwd")

Page 27: Introduction to Cassandra

Cascal

Get multiple columns:

val row = session.list(key)val cols = session.list(key, RangePredicate("email", "passwd"))val cols = session.list(key, ColumnPredicate( List("passwd", "email") ))

Page 28: Introduction to Cassandra

Cascal

Get multiple rows:

val family = "app" \ "users"val rows = session.list(family, RangePredicate("dan", "shimi"))val rows = session.list(family, KeyPrdicate("dan", "shimi"))

Page 29: Introduction to Cassandra

Cascal

Remove column:session.remove("app" \ "users" \ "shimi" \ "passwd")

Remove row:session.remove("app" \ "users" \ "shimi")

Batch operations:

val deleteCols = Delete(key, ColumnPredicate("age" :: "sex"))val insertEmail = Insert(key \ "email" \ "shimi.k@...")session.batch(insertEmail :: deleteCols)

Page 30: Introduction to Cassandra

Guidelines

● Keep together the data you query together● Think about your use case and how you should fetch your

data.● Don't try to normalize your data● You can't win the disk● Be ready to get your hands dirty● There is no single solution for everything. You might

consider using different solutions together

Page 31: Introduction to Cassandra

The End

Useful links:● Cassandra, http://cassandra.apache.org/● Wiki http://wiki.apache.org/cassandra/● Cassandra mailing list● IRC● Bigtable, http://labs.google.com/papers/bigtable.html● Dynamo http://www.allthingsdistributed.

com/2007/10/amazons_dynamo.html● Cascal, https://github.com/shimi/cascal


Recommended