Data Modeling Basics for the CloudRobert StuppSolutions Architect @ DataStax – Committer to Apache Cassandra
© 2016 DataStax, All Rights Reserved.
2
Data Modeling for the CloudDSE is the databasefor the cloud
1.Always On2.Instantaneously Responsive3.Numerous Endpoints4.Geographically Distributed5.Predictively Scalable CC BY 2.0, by Blake Patterson on Flickr
100000 transactionsper second
200000 transactionsper second
© DataStax, All Rights Reserved. 3
Application
Replication Factor 3
Eventual Consistency… is not hopefully consistent
Some data
Some dataSome data
Consistency Level:ONE
© DataStax, All Rights Reserved. 4
Application
UP
Replication Factor 3
Quorum Consistency
Some data
Some dataSome data
Consistency Level:QUORUM
DOWN
© DataStax, All Rights Reserved. 5
DSE / Cassandra NodeApplication
Write Path
Memtable
Commit LogFiles
SSTable
Some data
Some data
SSTableSSTable SSTableSSTable SSTable
Some data
Some data
Some data
Some data
Some data
© DataStax, All Rights Reserved. 6
Compaction
SSTable SSTable SSTable SSTable
SSTable
© DataStax, All Rights Reserved. 7
Compaction Strategies
• Size Tiered• Leveled• Date Tiered
Data Organization in DSE / CassandraPartition
Device ID Timestamp Temperature
Humidity
01-32483-17383
2016-04-19 14:00
22 70
01-32483-17383
2016-04-19 15:00
21.5 65
01-32483-17383
2016-04-19 16:00
23.0 70
PartitionKey
Clustering Key Columns
Primary Key
Device ID Timestamp01-32483-17383
2016-04-19 14:00
01-32483-17383
2016-04-19 15:00
01-32483-17383
2016-04-19 16:00
Device ID01-32483-1738301-32483-1738301-32483-17383
© DataStax, All Rights Reserved. 9
Data Modeling 1011. Understand your data
Conceptual data modeling2. Collect queries
Understand your application3. Model according to queries
Logical data modeling4. Apply optimizations
Physical data modeling
© DataStax, All Rights Reserved. 10
Query driven modeling
1. Collect your use cases2. Extract queries3. Model your tables
© 2016 DataStax, All Rights Reserved.
11
Queries, yesSELECT timestamp, temperature, humidityFROM sensor_dataWHERE sensor_id = ’01-32483-17383’
Always include the
Partition Key
© DataStax, All Rights Reserved. 12
Some standard use-cases
• Customer registration• Customer login• Delivery addresses
© 2016 DataStax, All Rights Reserved.
13
Customer registration1. Check if customer exists
query by username
CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text);
SELECT username FROM customers WHERE username = ?
© 2016 DataStax, All Rights Reserved.
14
Customer login by username1. Check if user exists and password matches
query by username
CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text);
SELECT password_hash FROM customers WHERE username = ?
© 2016 DataStax, All Rights Reserved.
15
Customer login by email1. Check if user exists and password matches
query by email
CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text);
SELECT password_hash FROM customers WHERE email = ?
InvalidRequest: code=2200 [Invalid query] message="Cannot execute this query as it mightinvolve data filtering and thus may have unpredictable performance.
© 2016 DataStax, All Rights Reserved.
16
Customer login by email1. Check if user exists and password matches
query by email
CREATE TABLE customers_by_email ( email text PRIMARY KEY, password_hash text, first_name text, last_name text, username text);
SELECT password_hash FROM customers_by_email WHERE email = ?
This works
© 2016 DataStax, All Rights Reserved.
17
Modeling delivery addressesCREATE TABLE customer_addresses ( username text, address_type text, street text, zip text, city text, PRIMARY KEY ( username, address_type ));
SELECT street,zip,city FROM customer_addresses WHERE username = ?;
SELECT street,zip,city FROM customer_addresses WHERE username = ? AND address_type = ?;
© 2016 DataStax, All Rights Reserved.
18
Modeling delivery addresses1. Print delivery address label
query by user by user namequery delivery address by user and type
SELECT first_name, last_name FROM customers WHERE username = ?;
SELECT street,zip,city FROM customer_addresses WHERE username = ? AND address_type = ?;
This works,
But it’s not great.
© 2016 DataStax, All Rights Reserved.
19
Modeling delivery addressesCREATE TYPE delivery_address ( street text, zip text, city text);
Just 1 read
CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text, delivery_addrs map < text, frozen < delivery_address > >);SELECT first_name, last_name, delivery_addrs FROM customers WHERE username = ?;
© 2016 DataStax, All Rights Reserved.
20
Customer registration – the problemSELECT username FROM customersWHERE username = ?(no results)
SELECT username FROM customersWHERE username = ?(no results)INSERT INTO customers
(username, first_name, last_name)VALUES(‘snazy’, ‘Robert’, ‘Stupp’)(success)
INSERT INTO customers(username, first_name, last_name)VALUES(‘snazy’, ‘Not’, ‘Robert’)(success)
This one winsThis one gets overwritten
© 2016 DataStax, All Rights Reserved.
21
Customer registration – the solutionSELECT username FROM customersWHERE username = ?(no results)
SELECT username FROM customersWHERE username = ?(no results)
INSERT INTO customers …IF NOT EXISTS [applied] = true
INSERT INTO customers …IF NOT EXISTS [applied] = false
Sorry, dudeOK
© 2016 DataStax, All Rights Reserved.
22
Customer registration – the even better solutionINSERT INTO customers …IF NOT EXISTS [applied] = true
INSERT INTO customers …IF NOT EXISTS [applied] = false
Sorry, dude
OK
© 2016 DataStax, All Rights Reserved.
23
Customer login by email – w/ DSE 5.01. Check if user exists and password matches
query by email
CREATE TABLE customers ( username text PRIMARY KEY, password_hash text, first_name text, last_name text, email text);
CREATE MATERIALIZED VIEW customers_by_email AS SELECT email, username, first_name, last_name, password_hash FROM customers WHERE email IS NOT NULL PRIMARY KEY ( email, username );
SELECT password_hash FROM customers_by_email WHERE email = ?;
May the node be with you!
Robert Stupp Solutions Architect @ [email protected] Committer to Apache Cassandra@snazy