Date post: | 15-Jan-2015 |
Category: |
Technology |
Upload: | planet-cassandra |
View: | 158 times |
Download: | 4 times |
CASSANDRA @
ULTRAVISUALCassandra Day New York 2014
Skye BookLead Systems Architect
ULTRAVISUAL
A visual network for inspiration, expression,
and collaboration
The Feed• A user’s first taste of UV
• More than just posts
• Constantly being tweaked and re-thought
SELECT DISTINCT _post.*FROM _postJOIN _collection_post cp ON _post.uuid=cp.post_uuidJOIN _collection_follow cf ON cp.c_uuid=cf.collection_uuidWHERE cf.user_id = ?ORDER BY _post.created_at DESCLIMIT 20 OFFSET 0
The Old Way
Started Simple !
“Show me recent posts in collections I follow”
SELECT a.*FROM _user_follow a, _user_follow bWHERE b.follower=12345AND a.follower=b.followedORDER BY a.followed_at DESCLIMIT 20 OFFSET 0
The Old Way
Added Complexity !
“Show me people recently followed by my connections”
The Old Way
Every new feature needs another query !
Feed requests generate a disproportionate amount of load to normal CRUD ops
Reframing the Problem
From This:
A place for posts, new collections, social activity, and anything else interesting
nitro404.com/computers/knex.php
Reframing the Problem
To This:
A list of items interesting to the user
The New Way
Model First
• With an SQL background, this can be misleading.
• Essential Question: “How do I need to access this data?”
–Rick Branson, Instagram Cassandra Summit 2013
“Try to model data as a log of user intent”
The New Way
}The New Way
user status
created_at
story json2 0 61b97280 user_follow:3:5 {“foo”:”bar”}
2 1 5daa04c0 post:bfbd0a39 {“foo”:”bar”}
2 1 565752e0 collection_follow:5:d70961c1
{“foo”:”bar”}
2 1 4a8189e0 user_follow:3:5 {“foo”:”bar”}
Primary Key Cached story JSON
Model for user feeds
• Fast to fetch user stories
• Cached JSON means almost zero SQL requests
Fast.Response times cut from
over 100’s ms to 30ms range
Launch WeekFeatured by Apple!
Cluster Disk Usage
26%
74%
Don’t be too cute
cqlsh:ultravisual> ALTER TABLE latest_feed DROP json;
Handling Deletions• Data is only appended,
never deleted from user feeds
• Adapted Instagram’s ‘Anti-Column’ solution
• Avoids missed deletions for nodes down longer than GCGraceSeconds
• Avoids race condition where deletion arrives before write.
Sam follows Sandy
user
created_at
status
story2 4a8189e0 1 user_follow:
3:5Sam unfollows Sandy
user
created_at
status
story2 61b97280 0 user_follow:
3:52 4a8189e0 1 user_follow:
3:5
Negated Entriesuser
created_at
status
story2 61b97280 0 user_follow:
3:52 4a8189e0 1 user_follow:
3:5
user
status
created_at
story2 0 61b97280 user_follow:
3:52 1 4a8189e0 user_follow:
3:5
Keeps all entries in a single time series
First page can usually be populated by a single read
Splits user’s row into two lists, live and undo
Will always require at least two reads
Further Uses• User Notifications
• User Onboarding
• Reshare Statistics
• User & Content Reports
• API Statistics
User Onboarding
user created_at
sequence step content2 61b97280 onboaring_v2 1 rec_collections_1
3 5daa04c0 onboaring_v2 2 rec_collections_2
5 565752e0 onboaring_v3 1 find_friends
6 4a8189e0 onboaring_v3 1 find_friends
Sequenced feed entries for users on signup
Production Experiences
Drivers • Java: Started with Astyanax, moved to Datastax
v2
• Node.js: node-cassandra-cql
Cryptic message with large batch updates in pre-release versions of 2.0 driver
DS Driver Issue 229
com.datastax.driver.core.exceptions.DriverInternalError: An unexpected protocol error occured. This is a bug in this library, please report: Unknown code 256 for a consistency level
As of 2.0, batches with more than 64k statements throw a better exception:
java.lang.IllagalStateException: Batch statement cannot contain more than 65536 statements.
Just use LZ4
Compression
Cassandra-4851Unfortunate truth in Cassandra 2.0.5
!cqlsh:test> SELECT * FROM user_feed WHERE user = 2 AND created_at > :some_uuid AND status=0;!cqlsh:test> Bad Request: PRIMARY KEY part status cannot be restricted (preceding part created_at is either not restricted or by a non-EQ relation)
Cassandra-4851
Adds CQL3 support for vector comparison syntax
!cqlsh:test> SELECT * FROM timeline WHERE day = ’21 Jun 2014’ AND (hour,min) >= (3,50) AND (hour,min,sec) <= (4,37,30);
Available in 2.0.6
Production ExperiencesUpgrades • Manual package installs (dsc20 from Datastax)
• One node at a time
• Upgrade, wait for healthy status & operations, move on
• OpsCenter provides good overview
Production Experiences
Speaking of OpsCenter… • Don’t be alarmed if nodes appear but agent
data does not
• opscenterd often needs a restart after cluster upgrade to see agents again
Production Experiences
Service Discovery • Running on AWS using EC2MultiRegionSnitch
• Using OpsWorks (Amazon’s Chef service) for seed config
Chef Cookbookgithub.com/skyebook/cassandra-opsworks-chef-cookbook
• Forked from Michael Klishin’s awesome C* cookbook
• Added integration with OpsWorks’ stack.json# Add this node as the first seed# If using the multi-region snitch, we must use the public IP addressif node["cassandra"]["snitch"] == "Ec2MultiRegionSnitch" seed_array << node["opsworks"]["instance"]["ip"]else seed_array << node["opsworks"]["instance"]["private_ip"]end!node["opsworks"]["layers"]["cassandra"]["instances"].each do |instance_name, values| if node["cassandra"]["snitch"] == "Ec2MultiRegionSnitch" seed_array << values["ip"] else seed_array << values["private_ip"] endend set[:cassandra][:seeds] = seed_array
Questions