Cassandra and Clojure

Post on 15-Jan-2015

997 views 1 download

Tags:

description

An introduction to Cassandra as well as an example of accessing Cassandra from Clojure. Includes an introduction to cluster architecture and data model in Cassandra. The code for the examples is available at: https://github.com/nickmbailey/clojure-cassandra-demo

transcript

©2013 DataStax. Do not distribute without consent.©2013 DataStax. Do not distribute without consent.

Nick Bailey

OpsCenter Architect

Cassandra and Clojure

Who am I?• OpsCenter Architect

• Monitoring/management tool for Cassandra

• Organizer of Austin Cassandra Users• http://www.meetup.com/Austin-Cassandra-Users/

• Third Thursday each month. Come join!

• Working with Cassandra for 4 years

Cassandra - An introduction

Cassandra - Intro

• Based on Amazon Dynamo and Google BigTable papers

• Shared nothing

• Distributed

• Predictable scaling

Dynamo

BigTable

Users

33

Cassandra - Architecture

Cassandra - Cluster Architecture

• All nodes participate in a cluster

• Shared nothing

• Add or remove as needed

• More capacity? Add a server

Cassandra - Data Distribution

75

0

25

50

• Each node owns 1 or more “tokens”

• Each piece of data has a “partition key”

• Partition key is hashed to determine token

• Hashes:

• Murmur3 (default)

• Md5

Cassandra - Replication

• Client writes to any node

• Node coordinates with replicas

• Data replicated in parallel

• Replication factor (RF): How many copies of your data?

Cassandra - Failure Modes

• Consistency level

• How many nodes?

• ONE/QUORUM/ALL

Cassandra - Geographically Distributed

• Client writes local

• Data syncs across WAN

• Replication Factor per DC

• Consistency Level

• LOCAL_QUORUM

Datacenter East Datacenter West

Data Modeling - Concepts

CQL• Cassandra Query Language

• SQL-like

• Not Relational

Terminology• Keyspace

• Table (Column Family)

• Row

• Column

• Partition Key

• Clustering Key

Data Typescqlsh:clojure_cassandra_demo> help types

CQL types recognized by this version of cqlsh:

ascii bigint blob boolean counter decimal double float inet int list map set text timestamp timeuuid uuid varchar varint

Advanced Concepts• Lightweight Transactions

• Atomic Batches

• User Defined Types (coming soon)

Data Modeling - An Example

Approaching Data Modeling• Model your queries, not your data

• Generally, optimize for reads

• Denormalize!

• Iterate!

Basic Last.fm Clone• See songs that user X has listened to recently

• See user X’s favorite songs in a specific month

• See who has recently listened to artist Y

• See artist Y’s most popular songs in a specific week

Basic Last.fm Clone• See songs that user X has listened to recently

• One of the most common patterns/data models

• Time series

• Immutable (good fit for Clojure!)

Basic Last.fm Clone• See songs that user X has listened to recently

SELECT song, artist, played_at FROM user_history WHERE username = ‘nickmbailey’ORDER BY played_at DESC;

• Partition key = ‘username’

• Clustering key = ‘played_at’

Basic Last.fm Clone• See songs that user X has listened to recently

CREATE TABLE user_history ( username text, played_at timestamp, album text, artist text, song text, PRIMARY KEY (username, played_at)) WITH CLUSTERING ORDER BY (played_at DESC)

Basic Last.fm Clone• See songs that user X has listened to recently

• This table has a “bad” partition key

CREATE TABLE user_history ( username text, played_at timestamp, album text, artist text, song text, PRIMARY KEY (username, played_at)) WITH CLUSTERING ORDER BY (played_at DESC)

Basic Last.fm Clone• See songs that user X has listened to recently

• Much better partition key

CREATE TABLE user_history ( username text, year_and_month text, played_at timestamp, album text, artist text, song text, PRIMARY KEY ((username, year_and_month), played_at)) WITH CLUSTERING ORDER BY (played_at DESC)

Basic Last.fm Clone• See songs that user X has listened to recently

cqlsh:clojure_cassandra_demo> select * from user_history limit 5;

username | year_and_month | played_at | album | artist | song-------------+----------------+--------------------------+--------------------------+--------------------------+------------------------- nickmbailey | 2014-06 | 2014-06-30 17:13:54-0500 | Once More 'Round The Sun | Mastodon | Halloween nickmbailey | 2014-06 | 2014-06-30 17:08:53-0500 | Once More 'Round The Sun | Mastodon | Ember City b_hastings | 2014-06 | 2014-06-30 12:57:12-0500 | Buena Vista Social Club | Buena Vista Social Club | Chan Chan zack_smith | 2014-07 | 2014-07-30 12:49:35-0500 | Awake Remix | Tycho | Awake (Com Truise Remix) zack_smith | 2014-03 | 2014-03-30 12:44:50-0500 | Awake Remix | Tycho | Awake

Partition Key - unordered Clustering Key - Ordered

Basic Last.fm Clone• See user X’s favorite songs in a specific month

SELECT song, artist, play_count FROM user_history WHERE username = ‘nickmbailey’ AND month = ‘July’ORDER BY play_count DESC;

• Partition key = ‘username’, ‘month’

• Clustering key = ‘play_count’?

• Counters are a special case

Counters• Counter can not be part of the PRIMARY KEY

• No ordering based on counter value

• All non counter columns must be part of the PRIMARY KEY

• Limitations due to the storage format

Basic Last.fm Clone• See user X’s favorite songs in a specific month

CREATE TABLE user_song_counts ( username text, year_and_month text, artist text, song text, play_count counter, PRIMARY KEY ((username, year_and_month), artist, song))

Basic Last.fm Clone• See user X’s favorite songs in a specific month

• Results unordered• Client will have to do the sorting

cqlsh:clojure_cassandra_demo> select * from user_song_counts where username = 'nickmbailey' and year_and_month = '2014-07';

username | year_and_month | artist | song | count-------------+----------------+----------+-----------------------------------+------- nickmbailey | 2014-07 | Amos Lee | Tricksters, Hucksters, And Scamps | 10 nickmbailey | 2014-07 | Beck | Blackbird Chain | 1 nickmbailey | 2014-07 | Beck | Blue Moon | 4 nickmbailey | 2014-07 | Cherub | <3 | 12 nickmbailey | 2014-07 | Cherub | Chocolate Strawberries | 6

Basic Last.fm Clone• See who has recently listened to artist Y

CREATE TABLE artist_history ( artist text, year_and_week text, played_at timestamp, album text, song text, username text, PRIMARY KEY ((artist, year_and_week), played_at)) WITH CLUSTERING ORDER BY (played_at DESC)

Basic Last.fm Clone• See artist Y’s most popular songs in a specific week

CREATE TABLE artist_song_counts ( artist text, year_and_week text, album text, song text, play_count counter, PRIMARY KEY ((artist, year_and_week), album, song))

Cassandra from Clojure

Building Blocks

• Java Driver

• Hayt

Java Driver

• Fully featured

• Connection pooling

• Failover policies

• Retry policies

• Sync and Async interfaces

• Exposes client metrics

• https://github.com/datastax/java-driver

Hayt

• CQL DSL

• Similar to Korma

• Solely for building CQL strings

• https://github.com/mpenet/hayt

(select :foo (where { :bar 1

:baz 2)})

(->raw (select :foo (where {:bar 1 :baz 2)}))> "SELECT * FROM foo WHERE bar = 1 AND baz = 2;"

Clients

• Alia

• https://github.com/mpenet/alia

• Cassaforte

• https://github.com/clojurewerkz/cassaforte

• Both built on Java Driver and Hayt

• Not particularly different

Alia vs. Cassaforte

Cassaforte(let [conn (cc/connect ["127.0.0.1"])] (cql/create-keyspace conn "cassaforte_keyspace" (with {:replication {:class "SimpleStrategy" :replication_factor 1 }})))

Alia(def cluster (alia/cluster {:contact-points ["localhost"]}))(def session (alia/connect cluster))(alia/execute session

(create-keyspace :alia (if-exists false) (with {:replication {:class "SimpleStrategy" :replication_factor 1}})))

Learn by Example - Alia

Cluster Object

• Entry point

• Configures relevant client options

• :contact-points

• :load-balancing-policy

• :reconnection-policy

• :retry-policy

• and more!

(def cluster (alia/cluster {:contact-points ["localhost"]}))

Session Object

• A Session is associated with a keyspace

• Allows interacting with multiple keyspaces

(def cluster (alia/cluster {:contact-points [“localhost"]}))(def session (alia/connect cluster))(def session (alia/connect cluster) :my_keyspace)

Querying

• Multiple ways to query

• alia/execute

• Synchronous, block on result

• alia/execute-async

• Returns a Lamina result-channel (basically, a promise)

• Optional success/error callbacks

• alia/execute-chan

• Returns a core.async channel

• We won’t dive in to core.async now

Prepared Statements

• Statements can be prepared server side

• Better performance for common queries

(def prepared-statement (alia/prepare session "select * from users where user_name=?;"))

What else?

• See github and docs

• https://github.com/mpenet/alia

• http://mpenet.github.io/alia/qbits.alia.html

Demo

Demo

• https://github.com/nickmbailey/clojure-cassandra-demo

• Built with

• CCM - https://github.com/pcmanus/ccm

• Alia - https://github.com/mpenet/alia

• ring - https://github.com/ring-clojure/ring

• compojure - https://github.com/weavejester/compojure

• hiccup - https://github.com/weavejester/hiccup

• least - https://github.com/Raynes/least

MoreCassandra: http://cassandra.apache.org

DataStax Drivers: https://github.com/datastax

Documentation: http://www.datastax.com/docs

Getting Started: http://www.datastax.com/documentation/gettingstarted/index.html

Developer Blog: http://www.datastax.com/dev/blog

Cassandra Community Site: http://planetcassandra.org

Download: http://planetcassandra.org/Download/DataStaxCommunityEdition

Webinars: http://planetcassandra.org/Learn/CassandraCommunityWebinars

Cassandra Summit Talks: http://planetcassandra.org/Learn/CassandraSummit

©2013 DataStax Confidential. Do not distribute without consent.©2013 DataStax Confidential. Do not distribute without consent.