JON HADDAD THE LAST PICKLE
LEARN DATA MODELING BY EXAMPLE
THIS IS AWESOME!!!
WHAT’S THE LAST PICKLE DO?
WE HELP MAKE YOU A TEAM OF EXPERTS
> 50 YEARS COMBINED EXPERIENCE
WHO IS THIS GUY?
15 YEARS EXPERIENCE
4 YEARS WITH CASSANDRA
LEARNING HOW TO CASSANDRA
WHAT’S YOUR BACKGROUND?
ORACLE! MYSQL!
POSTGRES!
CQL LOOKS LIKE SQL
BAD ASSUMPTIONS
3RD NORMAL FORM?
WHERE’S MY JOINS?
SECONDARY INDEX?
DO IT WRONG
TRY TO DATA MODELGET ANGRY
WATCH VIDEOS & READ
EVERYTHING I KNOW IS WRONG
LEARN BY EXAMPLE
CASSANDRA DATASET MANAGER
CDM
APT FOR CASSANDRA DATA
INSTALL DATA TO YOUR CASSANDRA CLUSTER
cdm install <dataset>
jhaddad@rustyrazorblade ~$ cdm listStarting CDMDatasets:movielenskillrvideokillrweatherFinished.
jhaddad@rustyrazorblade ~$ cdm install movielensStarting CDMInstalling movielensChecking for repo at /Users/jhaddad/.cdm/movielensPulling latestCDM is using dataset path: /Users/jhaddad/.cdm/movielenscqlsh -e "DROP KEYSPACE IF EXISTS movielens; CREATE KEYSPACE movielens WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}"Schema: /Users/jhaddad/.cdm/movielens/schema.cqlLoading datacqlsh -k movielens -e "COPY movies FROM '/Users/jhaddad/.cdm/movielens/data/movies.csv'"cqlsh -k movielens -e "COPY users FROM '/Users/jhaddad/.cdm/movielens/data/users.csv'"cqlsh -k movielens -e "COPY ratings_by_user FROM '/Users/jhaddad/.cdm/movielens/data/ratings_by_user.csv'"cqlsh -k movielens -e "COPY original_movie_map FROM '/Users/jhaddad/.cdm/movielens/data/original_movie_map.csv'"cqlsh -k movielens -e "COPY ratings_by_movie FROM '/Users/jhaddad/.cdm/
jhaddad@rustyrazorblade ~/dev/cassandra$ cqlsh Connected to Test Cluster at 127.0.0.1:9042.[cqlsh 5.0.1 | Cassandra 3.10-SNAPSHOT | CQL spec 3.4.3 | Native protocol v4]Use HELP for help.cqlsh> use movielens ;cqlsh:movielens> desc tables;
movies users ratings_by_user original_movie_map ratings_by_movie
WHAT CAN WE DO WITH IT?▸ Learn by example
▸Blog posts / Tutorials
▸ Jupyter notebooks
▸Reference applications
▸Data Models for presentations
MANAGING REFERENCE / TEST DATA
DATASETS
MOVIELENS
DETAILS▸GroupLens Research Project
▸University of Minnesota
▸100K ratings
▸1K users
▸1700 movies
cqlsh:movielens> select id, avg_rating, genres, name ... from movies limit 1;
@ Row 1------------+-------------------------------------- id | 76a38f64-94d8-4b8f-b830-a40af96f8d20 avg_rating | 3.16667 genres | {'Drama'} name | Little Lord Fauntleroy (1936)
(1 rows)
cqlsh:movielens> select * from users limit 1;
@ Row 1------------+-------------------------------------- id | b52fcdfc-0eaf-4432-9896-aa22db56edb2 address | 0322 Mattie Ramp Apt. 177 age | 37 city | South Fremont gender | M name | Harrold Hills occupation | administrator zip | 06513
(1 rows)
BLOG: WORKING RELATIONALLY WITH CASSANDRA
CONNECTING CASSANDRA DATA WITH
GRAPHFRAMES
cdm install killrweather
Helena Edelson Patrick McFadin
cdm install killrvideo
Luke Tillman Patrick McFadin
UPCOMING DATA SETS
HEALTH CARE▸ Cancer Genome Atlas Project
▸ Ebola cases
▸ Healthcare financial data
▸ Dani Traphagen
NYC TAXI DATA▸ pick up / drop off times & locations
▸ trip distances
▸ itemized fares
▸ rate types
▸ payment types
SOCIAL DATA▸Higgs Twitter Data
▸Foursquare
▸Enron executive emails
HOW TO CONTRIBUTE
ADD FEATURES
SUGGEST DATASETS
CREATE A DATASET▸ create a git repo
▸ datasets.yaml
▸ schema.cql
▸ insert data
▸ “cdm dump”
▸ cdm install .
▸ create a PR on cdm-java
OMG BEST DATASET EVER
@RUSTYRAZORBLADE
THANK YOU, KIND HUMANS