Cassandra Community Webinar - Introduction To Apache Cassandra 1.2

transcript

CASSANDRA COMMUNITY WEBINARS APRIL 2013

INTRODUCTION TO APACHE CASSANDRA 1.2

Aaron MortonApache Cassandra Committer, Data Stax MVP for Apache Cassandra

@aaronmortonwww.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Cassandra Summit 2013June 11 & 12San Francisco

Use SFSummit25 for 25% off

Cassandra Summit 2013DataStax Ac*ademy

Free certification during the summit.

OverviewThe ClusterThe Node

The Data Model

Cassandra- Started at Facebook- Open sourced in 2008- Top Level Apache project

since 2010.

Used by...Netflix, Twitter, Reddit,

Rackspace...

Inspiration- Google Big Table (2006) - Amazon Dynamo (2007)

Why Cassandra?- Scale- Operations- Data Model

The Data Model

Store ‘foo’ key with Replication Factor 3.Node 1 - 'foo'

Node 2 - 'foo'Node 4

Node 3 - 'foo'

Consistent Hashing.- Evenly map keys to nodes- Minimise key movements

when nodes join or leave

Partitioner.RandomPartitioner transforms Keys to Tokens

using MD5.(Default pre version 1.2.)

Partitioner.Murmur3Partitioner transforms Keys to Tokens

using Murmur3.(Default in version 1.2.)

Keys and Tokens?

token 0 99

key 'fop' 'foo'

Token Ring.

'foo'token: 90

'fop'token: 10

Token Ranges pre v1.2.Node 1token: 0

Node 2token: 25

Node 4token: 75

Node 3token: 50

1-2576-0

Token Ranges with Virtual Nodes in v1.2.Node 1

Node 2

Node 3

Node 4

Locate Token Range.Node 1

Node 2

Node 3

Node 4

'foo'token: 90

Replication Strategy selects Replication Factor number of

nodes for a row.

SimpleStrategy with RF 3.Node 1

Node 2

Node 3

Node 4

'foo'token: 90

NetworkTopologyStrategy uses a Replication Factor per Data

Centre. (Default.)

Multi DC Replication with RF 3 and RF 2.Node 1

Node 2

Node 3

Node 4

'foo'token: 90

Node 1

Node 2

Node 3

Node 4

West DC East DC

The Snitch knows which Data Centre and Rack the Node is

SimpleSnitch.Places all nodes in the same

DC and Rack.(Default, there are others.)

EC2Snitch.DC is set to AWS Region and a Rack to Availability Zone.

The Client and the Coordinator.Node 1

Node 2

Node 3

Node 4

'foo'token: 90

Client

Multi DC Client and the Coordinator.Node 1

Node 2

Node 3

Node 4

'foo'token: 90

Client

Node 10

Node 20

Node 30

Node 40

Gossip.Nodes share information with a small number of neighbours. Who share information with a

small number of neigh..

Consistency Level (CL).- Specified for each request- Number of nodes to wait

Consistency Level (CL)- Any*- One, Two Three- QUORUM- LOCAL_QUORUM, EACH_QUOURM*

QUOURM at Replication Factor...

ReplicationFactor

QUOURM

2 or 3 4 or 5 6 or 7

Write ‘foo’ at QUOURM with Hinted Handoff.Node 1

Node 2

Node 3

Node 4'foo' for #3

'foo'token: 90

Client

Read ‘foo’ at QUOURM.Node 1

Node 2

Node 3

Node 4

'foo'token: 90

Client

Column Timestamps used to resolve

differences.

Resolving differences.Column Node 1 Node 2 Node 3

purplecromulent

(timestamp 10)cromulent

(timestamp 10) <missing>

monkeyembiggens

(timestamp 10)embiggens

(timestamp 10)debigulator

(timestamp 5)

dishwashertomato

(timestamp 10)tomato

(timestamp 10)tomacco

(timestamp 15)

Consistent read for ‘foo’ at QUOURM.Node 1

Node 2

Node 3

Node 4

Client

cromulent

<empty>

Node 1

Node 2

Node 3

Node 4

Client

cromulent cromulent

Strong Consistency

W + R > N(#Write Nodes + #Read Nodes> Replication Factor)

Achieving Strong Consistency.- QUOURM Read + QUORUM Write- ALL Read + ONE Write- ONE Read + ALL Write

Achieving Consistency- Consistency Level- Hinted Handoff- Read Repair- Anti Entropy

The Data Model

Optimised for Writes.

Write path

Append to Write Ahead Log.

(fsync every 10s by default, other options available)

Write path...Merge Columns into Memtable.

(Lock free, always in memory.)

(Later.)Asynchronously flush

Memtable to new files.(May be 10’s or 100’s of MB in size.)

Data is stored in immutable SSTables.

(Sorted String table.)

SSTable files.*-Data.db*-Index.db*-Filter.db

(And others)

SSTables.

SSTable 1foo: dishwasher (ts 10): tomato purple (ts 10): cromulent

SSTable 2foo: frink (ts 20): flayven monkey (ts 10): embiggins

SSTable 3 SSTable 4foo: dishwasher (ts 15): tomacco

SSTable 5

Read purple, monkey, dishwasher.

SSTable 1-Data.dbfoo: dishwasher (ts 10): tomato purple (ts 10): cromulent

SSTable 2-Data.dbfoo: frink (ts 20): flayven monkey (ts 10): embiggins

SSTable 3-Data.db SSTable 4-Data.dbfoo: dishwasher (ts 15): tomacco

SSTable 5-Data.db

Bloom Filter

Index Sample

SSTable 1-Index.db

Bloom Filter

Index Sample

SSTable 2-Index.db

Bloom Filter

Index Sample

SSTable 3-Index.db

Bloom Filter

Index Sample

SSTable 4-Index.db

Bloom Filter

Index Sample

SSTable 5-Index.db

Memory

Key Cache caches row key position in *-Data.db file.

(Removes up to1disk seek per SSTable.)

Read with Key Cache.

SSTable 5-Data.db

Key Cache

Index Sample

SSTable 1-Index.db

Key Cache

Index Sample

SSTable 2-Index.db

Key Cache

Index Sample

SSTable 3-Index.db

Key Cache

Index Sample

SSTable 4-Index.db

Key Cache

Index Sample

SSTable 5-Index.db

Memory

Bloom Filter Bloom Filter Bloom Filter Bloom Filter Bloom Filter

Row Cache caches entire row.(Removes all disk IO.)

Read with Row Cache.Row Cache

SSTable 5-Data.db

Key Cache

Index Sample

SSTable 1-Index.db

Key Cache

Index Sample

SSTable 2-Index.db

Key Cache

Index Sample

SSTable 3-Index.db

Key Cache

Index Sample

SSTable 4-Index.db

Key Cache

Index Sample

SSTable 5-Index.db

Memory

Bloom Filter Bloom Filter Bloom Filter Bloom Filter Bloom Filter

Compaction merges truth from multiple SSTables into one

SSTable with the same truth.(Manual and continuous background process.)

Compaction.Column SSTable 1 SSTable 2 SSTable 4 New

purplecromulent

(timestamp 10)<tombstone>(timestamp 15)

<tombstone>(timestamp 15)

monkeyembiggens

(timestamp 10)embiggens

(timestamp 10)

dishwashertomato

(timestamp 15)

The Data Model

Cassandra is good at

reading data from a row in the order it is stored.

Typically an efficient data model will

denormalize data and use the storage engine order.

To create a good data model

understand the queries your application requires.

API ChoiceThrift

Original and still fully supported API.

API ChoiceCQL3

New and fully supported API.

CQL 3A Table Orientated, Schema

Driven, Data Model and Query language similar to

CQL 3A Table Orientated, Schema

Driven, Data Model and Query language similar to

Twitter clone

using CQL 3 via the cqlsh tool.

bin/cqlsh

Queries?- Post Tweet to Followers - Get Tweet by ID- List Tweets by User- List Tweets in User Timeline- List Followers

Keyspace

A Namespace container.

Our Keyspace

CREATE KEYSPACE cass_community WITH replication = {'class':'NetworkTopologyStrategy', 'datacenter1':1};

A sparse collection of well known, ordered columns.

First Table

CREATE TABLE User ( user_name text, password text, real_name text, PRIMARY KEY (user_name));

Some userscqlsh:cass_community> INSERT INTO User ... (user_name, password, real_name) ... VALUES ... ('fred', 'sekr8t', 'Mr Foo');

cqlsh:cass_community> select * from User;

user_name | password | real_name-----------+----------+----------- fred | sekr8t | Mr Foo

Some userscqlsh:cass_community> INSERT INTO User ... (user_name, password) ... VALUES ... ('bob', 'pwd');

cqlsh:cass_community> select * from User where user_name = 'bob';

user_name | password | real_name-----------+----------+----------- bob | pwd | null

Data Model (so far)

Table / Value User

user_name Primary Key

Tweet TableCREATE TABLE Tweet ( tweet_id bigint, body text, user_name text, timestamp timestamp, PRIMARY KEY (tweet_id));

Tweet Table...cqlsh:cass_community> INSERT INTO Tweet ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, 'The Tweet','fred',1352150816917);

cqlsh:cass_community> select * from Tweet where tweet_id = 1;

Data Model (so far)

Table / Value User Tweet

user_name Primary Key Field

tweet_id Primary Key

UserTweets TableCREATE TABLE UserTweets ( tweet_id bigint, user_name text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id));

UserTweets Table...cqlsh:cass_community> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, 'The Tweet','fred',1352150816917);

cqlsh:cass_community> select * from UserTweets where user_name='fred';

UserTweets Table...cqlsh:cass_community> select * from UserTweets where user_name='fred' and tweet_id=1;

UserTweets Table...cqlsh:cass_community> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (2, 'Second Tweet', 'fred', 1352150816918);

cqlsh:cass_community> select * from UserTweets where user_name = 'fred';

UserTweets Table...cqlsh:cass_community> select * from UserTweets where user_name = 'fred' order by tweet_id desc;

UserTimelineCREATE TABLE UserTimeline ( user_name text, tweet_id bigint, tweet_user text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id))WITH CLUSTERING ORDER BY (tweet_id DESC);

UserTimelinecqlsh:cass_community> INSERT INTO UserTimeline ... (user_name, tweet_id, tweet_user, body, timestamp) ... VALUES ... ('fred', 1, 'fred', 'The Tweet',1352150816917);

cqlsh:cass_community> INSERT INTO UserTimeline ... (user_name, tweet_id, tweet_user, body, timestamp) ... VALUES ... ('fred', 100, 'bob', 'My Tweet',1352150846917);

UserTimelinecqlsh:cass_community> select * from UserTimeline where user_name = 'fred';

Data Model (so far)

Table / Value User Tweet User

TweetsUser

Timeline

user_name Primary Key Field Primary Key Primary Key

tweet_id Primary Key Primary KeyComponent

Primary KeyComponent

UserMetrics TableCREATE TABLE UserMetrics( user_name text, tweets counter, followers counter, following counter, PRIMARY KEY (user_name));

UserMetrics Table...cqlsh:cass_community> UPDATE ... UserMetrics ... SET ... tweets = tweets + 1 ... WHERE ... user_name = 'fred';cqlsh:cass_community> select * from UserMetrics where user_name = 'fred'; user_name | followers | following | tweets-----------+-----------+-----------+-------- fred | null | null | 1

Data Model (so far)

TweetsUser

Timeline User Metrics

user_name Primary Key Field Primary

KeyPrimary

RelationshipsCREATE TABLE Followers( user_name text, follower text, timestamp timestamp, PRIMARY KEY (user_name, follower));

CREATE TABLE Following( user_name text, following text, timestamp timestamp, PRIMARY KEY (user_name, following));

Relationshipscqlsh:cass_community> INSERT INTO ... Following ... (user_name, following, timestamp) ... VALUES ... ('bob', 'fred', 1352247749161);cqlsh:cass_community> INSERT INTO ... Followers ... (user_name, follower, timestamp) ... VALUES ... ('fred', 'bob', 1352247749161);

Relationshipscqlsh:cass_community> select * from Following;

user_name | following | timestamp-----------+-----------+-------------------------- bob | fred | 2012-11-07 13:22:29+1300

cqlsh:cass_community> select * from Followers;

user_name | follower | timestamp-----------+----------+-------------------------- fred | bob | 2012-11-07 13:22:29+1300

Data Model

TweetsUser

TimelineUser

MetricsFollows

Followers

user_name Primary Key Field Primary

KeyPrimary

Cassandra Summit 2013June 11 & 12San Francisco

Use SFSummit25 for 25% off

Thanks.

Aaron Morton@aaronmorton

www.thelastpickle.com

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Cassandra Community Webinar - Introduction To Apache Cassandra 1.2

Technology