+ All Categories
Home > Software > Cassandra and Clojure

Cassandra and Clojure

Date post: 15-Jan-2015
Category:
Upload: nickmbailey
View: 997 times
Download: 1 times
Share this document with a friend
Description:
An introduction to Cassandra as well as an example of accessing Cassandra from Clojure. Includes an introduction to cluster architecture and data model in Cassandra. The code for the examples is available at: https://github.com/nickmbailey/clojure-cassandra-demo
Popular Tags:
47
©2013 DataStax. Do not distribute without consent. ©2013 DataStax. Do not distribute without consent. Nick Bailey OpsCenter Architect Cassandra and Clojure
Transcript
Page 1: Cassandra and Clojure

©2013 DataStax. Do not distribute without consent.©2013 DataStax. Do not distribute without consent.

Nick Bailey

OpsCenter Architect

Cassandra and Clojure

Page 2: Cassandra and Clojure

Who am I?• OpsCenter Architect

• Monitoring/management tool for Cassandra

• Organizer of Austin Cassandra Users• http://www.meetup.com/Austin-Cassandra-Users/

• Third Thursday each month. Come join!

• Working with Cassandra for 4 years

Page 3: Cassandra and Clojure

Cassandra - An introduction

Page 4: Cassandra and Clojure

Cassandra - Intro

• Based on Amazon Dynamo and Google BigTable papers

• Shared nothing

• Distributed

• Predictable scaling

Dynamo

BigTable

Page 5: Cassandra and Clojure

Users

33

Page 6: Cassandra and Clojure

Cassandra - Architecture

Page 7: Cassandra and Clojure

Cassandra - Cluster Architecture

• All nodes participate in a cluster

• Shared nothing

• Add or remove as needed

• More capacity? Add a server

Page 8: Cassandra and Clojure

Cassandra - Data Distribution

75

0

25

50

• Each node owns 1 or more “tokens”

• Each piece of data has a “partition key”

• Partition key is hashed to determine token

• Hashes:

• Murmur3 (default)

• Md5

Page 9: Cassandra and Clojure

Cassandra - Replication

• Client writes to any node

• Node coordinates with replicas

• Data replicated in parallel

• Replication factor (RF): How many copies of your data?

Page 10: Cassandra and Clojure

Cassandra - Failure Modes

• Consistency level

• How many nodes?

• ONE/QUORUM/ALL

Page 11: Cassandra and Clojure

Cassandra - Geographically Distributed

• Client writes local

• Data syncs across WAN

• Replication Factor per DC

• Consistency Level

• LOCAL_QUORUM

Datacenter East Datacenter West

Page 12: Cassandra and Clojure

Data Modeling - Concepts

Page 13: Cassandra and Clojure

CQL• Cassandra Query Language

• SQL-like

• Not Relational

Page 14: Cassandra and Clojure

Terminology• Keyspace

• Table (Column Family)

• Row

• Column

• Partition Key

• Clustering Key

Page 15: Cassandra and Clojure

Data Typescqlsh:clojure_cassandra_demo> help types

CQL types recognized by this version of cqlsh:

ascii bigint blob boolean counter decimal double float inet int list map set text timestamp timeuuid uuid varchar varint

Page 16: Cassandra and Clojure

Advanced Concepts• Lightweight Transactions

• Atomic Batches

• User Defined Types (coming soon)

Page 17: Cassandra and Clojure

Data Modeling - An Example

Page 18: Cassandra and Clojure

Approaching Data Modeling• Model your queries, not your data

• Generally, optimize for reads

• Denormalize!

• Iterate!

Page 19: Cassandra and Clojure

Basic Last.fm Clone• See songs that user X has listened to recently

• See user X’s favorite songs in a specific month

• See who has recently listened to artist Y

• See artist Y’s most popular songs in a specific week

Page 20: Cassandra and Clojure

Basic Last.fm Clone• See songs that user X has listened to recently

• One of the most common patterns/data models

• Time series

• Immutable (good fit for Clojure!)

Page 21: Cassandra and Clojure

Basic Last.fm Clone• See songs that user X has listened to recently

SELECT song, artist, played_at FROM user_history WHERE username = ‘nickmbailey’ORDER BY played_at DESC;

• Partition key = ‘username’

• Clustering key = ‘played_at’

Page 22: Cassandra and Clojure

Basic Last.fm Clone• See songs that user X has listened to recently

CREATE TABLE user_history ( username text, played_at timestamp, album text, artist text, song text, PRIMARY KEY (username, played_at)) WITH CLUSTERING ORDER BY (played_at DESC)

Page 23: Cassandra and Clojure

Basic Last.fm Clone• See songs that user X has listened to recently

• This table has a “bad” partition key

CREATE TABLE user_history ( username text, played_at timestamp, album text, artist text, song text, PRIMARY KEY (username, played_at)) WITH CLUSTERING ORDER BY (played_at DESC)

Page 24: Cassandra and Clojure

Basic Last.fm Clone• See songs that user X has listened to recently

• Much better partition key

CREATE TABLE user_history ( username text, year_and_month text, played_at timestamp, album text, artist text, song text, PRIMARY KEY ((username, year_and_month), played_at)) WITH CLUSTERING ORDER BY (played_at DESC)

Page 25: Cassandra and Clojure

Basic Last.fm Clone• See songs that user X has listened to recently

cqlsh:clojure_cassandra_demo> select * from user_history limit 5;

username | year_and_month | played_at | album | artist | song-------------+----------------+--------------------------+--------------------------+--------------------------+------------------------- nickmbailey | 2014-06 | 2014-06-30 17:13:54-0500 | Once More 'Round The Sun | Mastodon | Halloween nickmbailey | 2014-06 | 2014-06-30 17:08:53-0500 | Once More 'Round The Sun | Mastodon | Ember City b_hastings | 2014-06 | 2014-06-30 12:57:12-0500 | Buena Vista Social Club | Buena Vista Social Club | Chan Chan zack_smith | 2014-07 | 2014-07-30 12:49:35-0500 | Awake Remix | Tycho | Awake (Com Truise Remix) zack_smith | 2014-03 | 2014-03-30 12:44:50-0500 | Awake Remix | Tycho | Awake

Partition Key - unordered Clustering Key - Ordered

Page 26: Cassandra and Clojure

Basic Last.fm Clone• See user X’s favorite songs in a specific month

SELECT song, artist, play_count FROM user_history WHERE username = ‘nickmbailey’ AND month = ‘July’ORDER BY play_count DESC;

• Partition key = ‘username’, ‘month’

• Clustering key = ‘play_count’?

• Counters are a special case

Page 27: Cassandra and Clojure

Counters• Counter can not be part of the PRIMARY KEY

• No ordering based on counter value

• All non counter columns must be part of the PRIMARY KEY

• Limitations due to the storage format

Page 28: Cassandra and Clojure

Basic Last.fm Clone• See user X’s favorite songs in a specific month

CREATE TABLE user_song_counts ( username text, year_and_month text, artist text, song text, play_count counter, PRIMARY KEY ((username, year_and_month), artist, song))

Page 29: Cassandra and Clojure

Basic Last.fm Clone• See user X’s favorite songs in a specific month

• Results unordered• Client will have to do the sorting

cqlsh:clojure_cassandra_demo> select * from user_song_counts where username = 'nickmbailey' and year_and_month = '2014-07';

username | year_and_month | artist | song | count-------------+----------------+----------+-----------------------------------+------- nickmbailey | 2014-07 | Amos Lee | Tricksters, Hucksters, And Scamps | 10 nickmbailey | 2014-07 | Beck | Blackbird Chain | 1 nickmbailey | 2014-07 | Beck | Blue Moon | 4 nickmbailey | 2014-07 | Cherub | <3 | 12 nickmbailey | 2014-07 | Cherub | Chocolate Strawberries | 6

Page 30: Cassandra and Clojure

Basic Last.fm Clone• See who has recently listened to artist Y

CREATE TABLE artist_history ( artist text, year_and_week text, played_at timestamp, album text, song text, username text, PRIMARY KEY ((artist, year_and_week), played_at)) WITH CLUSTERING ORDER BY (played_at DESC)

Page 31: Cassandra and Clojure

Basic Last.fm Clone• See artist Y’s most popular songs in a specific week

CREATE TABLE artist_song_counts ( artist text, year_and_week text, album text, song text, play_count counter, PRIMARY KEY ((artist, year_and_week), album, song))

Page 32: Cassandra and Clojure

Cassandra from Clojure

Page 33: Cassandra and Clojure

Building Blocks

• Java Driver

• Hayt

Page 34: Cassandra and Clojure

Java Driver

• Fully featured

• Connection pooling

• Failover policies

• Retry policies

• Sync and Async interfaces

• Exposes client metrics

• https://github.com/datastax/java-driver

Page 35: Cassandra and Clojure

Hayt

• CQL DSL

• Similar to Korma

• Solely for building CQL strings

• https://github.com/mpenet/hayt

(select :foo (where { :bar 1

:baz 2)})

(->raw (select :foo (where {:bar 1 :baz 2)}))> "SELECT * FROM foo WHERE bar = 1 AND baz = 2;"

Page 36: Cassandra and Clojure

Clients

• Alia

• https://github.com/mpenet/alia

• Cassaforte

• https://github.com/clojurewerkz/cassaforte

• Both built on Java Driver and Hayt

• Not particularly different

Page 37: Cassandra and Clojure

Alia vs. Cassaforte

Cassaforte(let [conn (cc/connect ["127.0.0.1"])] (cql/create-keyspace conn "cassaforte_keyspace" (with {:replication {:class "SimpleStrategy" :replication_factor 1 }})))

Alia(def cluster (alia/cluster {:contact-points ["localhost"]}))(def session (alia/connect cluster))(alia/execute session

(create-keyspace :alia (if-exists false) (with {:replication {:class "SimpleStrategy" :replication_factor 1}})))

Page 38: Cassandra and Clojure

Learn by Example - Alia

Page 39: Cassandra and Clojure

Cluster Object

• Entry point

• Configures relevant client options

• :contact-points

• :load-balancing-policy

• :reconnection-policy

• :retry-policy

• and more!

(def cluster (alia/cluster {:contact-points ["localhost"]}))

Page 40: Cassandra and Clojure

Session Object

• A Session is associated with a keyspace

• Allows interacting with multiple keyspaces

(def cluster (alia/cluster {:contact-points [“localhost"]}))(def session (alia/connect cluster))(def session (alia/connect cluster) :my_keyspace)

Page 41: Cassandra and Clojure

Querying

• Multiple ways to query

• alia/execute

• Synchronous, block on result

• alia/execute-async

• Returns a Lamina result-channel (basically, a promise)

• Optional success/error callbacks

• alia/execute-chan

• Returns a core.async channel

• We won’t dive in to core.async now

Page 42: Cassandra and Clojure

Prepared Statements

• Statements can be prepared server side

• Better performance for common queries

(def prepared-statement (alia/prepare session "select * from users where user_name=?;"))

Page 43: Cassandra and Clojure

What else?

• See github and docs

• https://github.com/mpenet/alia

• http://mpenet.github.io/alia/qbits.alia.html

Page 44: Cassandra and Clojure

Demo

Page 45: Cassandra and Clojure

Demo

• https://github.com/nickmbailey/clojure-cassandra-demo

• Built with

• CCM - https://github.com/pcmanus/ccm

• Alia - https://github.com/mpenet/alia

• ring - https://github.com/ring-clojure/ring

• compojure - https://github.com/weavejester/compojure

• hiccup - https://github.com/weavejester/hiccup

• least - https://github.com/Raynes/least

Page 46: Cassandra and Clojure

MoreCassandra: http://cassandra.apache.org

DataStax Drivers: https://github.com/datastax

Documentation: http://www.datastax.com/docs

Getting Started: http://www.datastax.com/documentation/gettingstarted/index.html

Developer Blog: http://www.datastax.com/dev/blog

Cassandra Community Site: http://planetcassandra.org

Download: http://planetcassandra.org/Download/DataStaxCommunityEdition

Webinars: http://planetcassandra.org/Learn/CassandraCommunityWebinars

Cassandra Summit Talks: http://planetcassandra.org/Learn/CassandraSummit

Page 47: Cassandra and Clojure

©2013 DataStax Confidential. Do not distribute without consent.©2013 DataStax Confidential. Do not distribute without consent.


Recommended