Post on 27-Jun-2015
description
transcript
CASSANDRA COMMUNITY WEBINARS APRIL 2013
INTRODUCTION TO APACHE CASSANDRA 1.2
Aaron MortonApache Cassandra Committer, Data Stax MVP for Apache Cassandra
@aaronmortonwww.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
Cassandra Summit 2013June 11 & 12San Francisco
Use SFSummit25 for 25% off
Cassandra Summit 2013DataStax Ac*ademy
Free certification during the summit.
OverviewThe ClusterThe Node
The Data Model
Cassandra- Started at Facebook- Open sourced in 2008- Top Level Apache project
since 2010.
Used by...Netflix, Twitter, Reddit,
Rackspace...
Inspiration- Google Big Table (2006) - Amazon Dynamo (2007)
Why Cassandra?- Scale- Operations- Data Model
OverviewThe ClusterThe Node
The Data Model
Store ‘foo’ key with Replication Factor 3.Node 1 - 'foo'
Node 2 - 'foo'Node 4
Node 3 - 'foo'
Consistent Hashing.- Evenly map keys to nodes- Minimise key movements
when nodes join or leave
Partitioner.RandomPartitioner transforms Keys to Tokens
using MD5.(Default pre version 1.2.)
Partitioner.Murmur3Partitioner transforms Keys to Tokens
using Murmur3.(Default in version 1.2.)
Keys and Tokens?
token 0 99
key 'fop' 'foo'
10 90
Token Ring.
'foo'token: 90
'fop'token: 10
99 0
Token Ranges pre v1.2.Node 1token: 0
Node 2token: 25
Node 4token: 75
Node 3token: 50
1-2576-0
Token Ranges with Virtual Nodes in v1.2.Node 1
Node 2
Node 3
Node 4
Locate Token Range.Node 1
Node 2
Node 3
Node 4
'foo'token: 90
Replication Strategy selects Replication Factor number of
nodes for a row.
SimpleStrategy with RF 3.Node 1
Node 2
Node 3
Node 4
'foo'token: 90
NetworkTopologyStrategy uses a Replication Factor per Data
Centre. (Default.)
Multi DC Replication with RF 3 and RF 2.Node 1
Node 2
Node 3
Node 4
'foo'token: 90
Node 1
Node 2
Node 3
Node 4
West DC East DC
The Snitch knows which Data Centre and Rack the Node is
in.
SimpleSnitch.Places all nodes in the same
DC and Rack.(Default, there are others.)
EC2Snitch.DC is set to AWS Region and a Rack to Availability Zone.
The Client and the Coordinator.Node 1
Node 2
Node 3
Node 4
'foo'token: 90
Client
Multi DC Client and the Coordinator.Node 1
Node 2
Node 3
Node 4
'foo'token: 90
Client
Node 10
Node 20
Node 30
Node 40
Gossip.Nodes share information with a small number of neighbours. Who share information with a
small number of neigh..
Consistency Level (CL).- Specified for each request- Number of nodes to wait
for.
Consistency Level (CL)- Any*- One, Two Three- QUORUM- LOCAL_QUORUM, EACH_QUOURM*
QUOURM at Replication Factor...
ReplicationFactor
QUOURM
2 or 3 4 or 5 6 or 7
2 3 4
Write ‘foo’ at QUOURM with Hinted Handoff.Node 1
Node 2
Node 3
Node 4'foo' for #3
'foo'token: 90
Client
Read ‘foo’ at QUOURM.Node 1
Node 2
Node 3
Node 4
'foo'token: 90
Client
Column Timestamps used to resolve
differences.
Resolving differences.Column Node 1 Node 2 Node 3
purplecromulent
(timestamp 10)cromulent
(timestamp 10) <missing>
monkeyembiggens
(timestamp 10)embiggens
(timestamp 10)debigulator
(timestamp 5)
dishwashertomato
(timestamp 10)tomato
(timestamp 10)tomacco
(timestamp 15)
Consistent read for ‘foo’ at QUOURM.Node 1
Node 2
Node 3
Node 4
Client
cromulent
cromulent
<empty>
Node 1
Node 2
Node 3
Node 4
Client
cromulent cromulent
Strong Consistency
W + R > N(#Write Nodes + #Read Nodes> Replication Factor)
Achieving Strong Consistency.- QUOURM Read + QUORUM Write- ALL Read + ONE Write- ONE Read + ALL Write
Achieving Consistency- Consistency Level- Hinted Handoff- Read Repair- Anti Entropy
OverviewThe ClusterThe Node
The Data Model
Optimised for Writes.
Write path
Append to Write Ahead Log.
(fsync every 10s by default, other options available)
Write path...Merge Columns into Memtable.
(Lock free, always in memory.)
(Later.)Asynchronously flush
Memtable to new files.(May be 10’s or 100’s of MB in size.)
Data is stored in immutable SSTables.
(Sorted String table.)
SSTable files.*-Data.db*-Index.db*-Filter.db
(And others)
SSTables.
SSTable 1foo: dishwasher (ts 10): tomato purple (ts 10): cromulent
SSTable 2foo: frink (ts 20): flayven monkey (ts 10): embiggins
SSTable 3 SSTable 4foo: dishwasher (ts 15): tomacco
SSTable 5
Read purple, monkey, dishwasher.
SSTable 1-Data.dbfoo: dishwasher (ts 10): tomato purple (ts 10): cromulent
SSTable 2-Data.dbfoo: frink (ts 20): flayven monkey (ts 10): embiggins
SSTable 3-Data.db SSTable 4-Data.dbfoo: dishwasher (ts 15): tomacco
SSTable 5-Data.db
Bloom Filter
Index Sample
SSTable 1-Index.db
Bloom Filter
Index Sample
SSTable 2-Index.db
Bloom Filter
Index Sample
SSTable 3-Index.db
Bloom Filter
Index Sample
SSTable 4-Index.db
Bloom Filter
Index Sample
SSTable 5-Index.db
Memory
Disk
Key Cache caches row key position in *-Data.db file.
(Removes up to1disk seek per SSTable.)
Read with Key Cache.
SSTable 1-Data.dbfoo: dishwasher (ts 10): tomato purple (ts 10): cromulent
SSTable 2-Data.dbfoo: frink (ts 20): flayven monkey (ts 10): embiggins
SSTable 3-Data.db SSTable 4-Data.dbfoo: dishwasher (ts 15): tomacco
SSTable 5-Data.db
Key Cache
Index Sample
SSTable 1-Index.db
Key Cache
Index Sample
SSTable 2-Index.db
Key Cache
Index Sample
SSTable 3-Index.db
Key Cache
Index Sample
SSTable 4-Index.db
Key Cache
Index Sample
SSTable 5-Index.db
Memory
Disk
Bloom Filter Bloom Filter Bloom Filter Bloom Filter Bloom Filter
Row Cache caches entire row.(Removes all disk IO.)
Read with Row Cache.Row Cache
SSTable 1-Data.dbfoo: dishwasher (ts 10): tomato purple (ts 10): cromulent
SSTable 2-Data.dbfoo: frink (ts 20): flayven monkey (ts 10): embiggins
SSTable 3-Data.db SSTable 4-Data.dbfoo: dishwasher (ts 15): tomacco
SSTable 5-Data.db
Key Cache
Index Sample
SSTable 1-Index.db
Key Cache
Index Sample
SSTable 2-Index.db
Key Cache
Index Sample
SSTable 3-Index.db
Key Cache
Index Sample
SSTable 4-Index.db
Key Cache
Index Sample
SSTable 5-Index.db
Memory
Disk
Bloom Filter Bloom Filter Bloom Filter Bloom Filter Bloom Filter
Compaction merges truth from multiple SSTables into one
SSTable with the same truth.(Manual and continuous background process.)
Compaction.Column SSTable 1 SSTable 2 SSTable 4 New
purplecromulent
(timestamp 10)<tombstone>(timestamp 15)
<tombstone>(timestamp 15)
monkeyembiggens
(timestamp 10)embiggens
(timestamp 10)
dishwashertomato
(timestamp 10)tomacco
(timestamp 15)tomacco
(timestamp 15)
OverviewThe ClusterThe Node
The Data Model
Cassandra is good at
reading data from a row in the order it is stored.
Typically an efficient data model will
denormalize data and use the storage engine order.
To create a good data model
understand the queries your application requires.
API ChoiceThrift
Original and still fully supported API.
API ChoiceCQL3
New and fully supported API.
CQL 3A Table Orientated, Schema
Driven, Data Model and Query language similar to
SQL.
CQL 3A Table Orientated, Schema
Driven, Data Model and Query language similar to
SQL.
Twitter clone
using CQL 3 via the cqlsh tool.
bin/cqlsh
Queries?- Post Tweet to Followers - Get Tweet by ID- List Tweets by User- List Tweets in User Timeline- List Followers
Keyspace
A Namespace container.
Our Keyspace
CREATE KEYSPACE cass_community WITH replication = {'class':'NetworkTopologyStrategy', 'datacenter1':1};
Table
A sparse collection of well known, ordered columns.
First Table
CREATE TABLE User ( user_name text, password text, real_name text, PRIMARY KEY (user_name));
Some userscqlsh:cass_community> INSERT INTO User ... (user_name, password, real_name) ... VALUES ... ('fred', 'sekr8t', 'Mr Foo');
cqlsh:cass_community> select * from User;
user_name | password | real_name-----------+----------+----------- fred | sekr8t | Mr Foo
Some userscqlsh:cass_community> INSERT INTO User ... (user_name, password) ... VALUES ... ('bob', 'pwd');
cqlsh:cass_community> select * from User where user_name = 'bob';
user_name | password | real_name-----------+----------+----------- bob | pwd | null
Data Model (so far)
Table / Value User
user_name Primary Key
Tweet TableCREATE TABLE Tweet ( tweet_id bigint, body text, user_name text, timestamp timestamp, PRIMARY KEY (tweet_id));
Tweet Table...cqlsh:cass_community> INSERT INTO Tweet ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, 'The Tweet','fred',1352150816917);
cqlsh:cass_community> select * from Tweet where tweet_id = 1;
tweet_id | body | timestamp | user_name----------+-----------+--------------------------+----------- 1 | The Tweet | 2012-11-06 10:26:56+1300 | fred
Data Model (so far)
Table / Value User Tweet
user_name Primary Key Field
tweet_id Primary Key
UserTweets TableCREATE TABLE UserTweets ( tweet_id bigint, user_name text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id));
UserTweets Table...cqlsh:cass_community> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (1, 'The Tweet','fred',1352150816917);
cqlsh:cass_community> select * from UserTweets where user_name='fred';
user_name | tweet_id | body | timestamp-----------+----------+-----------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
UserTweets Table...cqlsh:cass_community> select * from UserTweets where user_name='fred' and tweet_id=1;
user_name | tweet_id | body | timestamp-----------+----------+-----------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
UserTweets Table...cqlsh:cass_community> INSERT INTO UserTweets ... (tweet_id, body, user_name, timestamp) ... VALUES ... (2, 'Second Tweet', 'fred', 1352150816918);
cqlsh:cass_community> select * from UserTweets where user_name = 'fred';
user_name | tweet_id | body | timestamp-----------+----------+--------------+-------------------------- fred | 1 | The Tweet | 2012-11-06 10:26:56+1300 fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300
UserTweets Table...cqlsh:cass_community> select * from UserTweets where user_name = 'fred' order by tweet_id desc;
user_name | tweet_id | body | timestamp-----------+----------+--------------+-------------------------- fred | 2 | Second Tweet | 2012-11-06 10:26:56+1300 fred | 1 | The Tweet | 2012-11-06 10:26:56+1300
UserTimelineCREATE TABLE UserTimeline ( user_name text, tweet_id bigint, tweet_user text, body text, timestamp timestamp, PRIMARY KEY (user_name, tweet_id))WITH CLUSTERING ORDER BY (tweet_id DESC);
UserTimelinecqlsh:cass_community> INSERT INTO UserTimeline ... (user_name, tweet_id, tweet_user, body, timestamp) ... VALUES ... ('fred', 1, 'fred', 'The Tweet',1352150816917);
cqlsh:cass_community> INSERT INTO UserTimeline ... (user_name, tweet_id, tweet_user, body, timestamp) ... VALUES ... ('fred', 100, 'bob', 'My Tweet',1352150846917);
UserTimelinecqlsh:cass_community> select * from UserTimeline where user_name = 'fred';
user_name | tweet_id | body | timestamp | tweet_user-----------+----------+-----------+--------------------------+------------ fred | 100 | My Tweet | 2012-11-06 10:27:26+1300 | bob fred | 1 | The Tweet | 2012-11-06 10:26:56+1300 | fred
Data Model (so far)
Table / Value User Tweet User
TweetsUser
Timeline
user_name Primary Key Field Primary Key Primary Key
tweet_id Primary Key Primary KeyComponent
Primary KeyComponent
UserMetrics TableCREATE TABLE UserMetrics( user_name text, tweets counter, followers counter, following counter, PRIMARY KEY (user_name));
UserMetrics Table...cqlsh:cass_community> UPDATE ... UserMetrics ... SET ... tweets = tweets + 1 ... WHERE ... user_name = 'fred';cqlsh:cass_community> select * from UserMetrics where user_name = 'fred'; user_name | followers | following | tweets-----------+-----------+-----------+-------- fred | null | null | 1
Data Model (so far)
Table / Value User Tweet User
TweetsUser
Timeline User Metrics
user_name Primary Key Field Primary
KeyPrimary
KeyPrimary
Key
tweet_id Primary Key
Primary KeyComponent
Primary KeyComponent
RelationshipsCREATE TABLE Followers( user_name text, follower text, timestamp timestamp, PRIMARY KEY (user_name, follower));
CREATE TABLE Following( user_name text, following text, timestamp timestamp, PRIMARY KEY (user_name, following));
Relationshipscqlsh:cass_community> INSERT INTO ... Following ... (user_name, following, timestamp) ... VALUES ... ('bob', 'fred', 1352247749161);cqlsh:cass_community> INSERT INTO ... Followers ... (user_name, follower, timestamp) ... VALUES ... ('fred', 'bob', 1352247749161);
Relationshipscqlsh:cass_community> select * from Following;
user_name | following | timestamp-----------+-----------+-------------------------- bob | fred | 2012-11-07 13:22:29+1300
cqlsh:cass_community> select * from Followers;
user_name | follower | timestamp-----------+----------+-------------------------- fred | bob | 2012-11-07 13:22:29+1300
Data Model
Table / Value User Tweet User
TweetsUser
TimelineUser
MetricsFollows
Followers
user_name Primary Key Field Primary
KeyPrimary
KeyPrimary
KeyPrimary
Key
tweet_id Primary Key
Primary KeyComponent
Primary KeyComponent
Cassandra Summit 2013June 11 & 12San Francisco
Use SFSummit25 for 25% off
Thanks.
Aaron Morton@aaronmorton
www.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License