Date post: | 19-Aug-2014 |
Category: |
Engineering |
Upload: | daniel-jin-hao-chia |
View: | 825 times |
Download: | 10 times |
Cassandra @ Coursera Deploying in AWS MySQL Transition
Daniel Chia @DanielJHChia
Software Engineer, Infrastructure
Overview
• Why Cassandra
• What goes into a good deployment
• MySQL → Cassandra transition experience
110 partners !
698 courses !
8.5 million learners
A Coursera Course
Your Final Project
This is your chance to apply the course concepts to real-world situations
Identity Verified Certificates
Technical
• 100% hosted on AWS
• Service-oriented architecture
• Mix of MySQL and Cassandra for persistence
What do we care about?
We care about…
• Availability
• Scalability
• Operational Ease
• Latency
• (Bonus) Multi-region writes
Availability matters
EBS Outage (2012)
Master us-east-1a
Slave us-east-1c
Scalability
Scalability
Sharded by class
class1
class2
class3
class4
class5
Machine 1
class6
class7
class8
class9
class10
Machine 2
class11
class12
class13
class14
class15
Machine 3
New use-caseUh-oh… doesn’t fit in existing sharding
We care about…
• Availability
• Scalability
• Operational Ease
• Performance
• (Bonus) Multi-region
Try Cassandra!So we decided to…
Cassandra ≠ [database XYZ]
–Albert Einstein
“But if you judge a fish by its ability to climb a tree, it will live its whole life believing that it is stupid.”
Time to deploy Cassandra!sudo apt-get install dse-full
A good deploymentMachine-level Cluster-level
Picking a machine
• Disk
• IOPS… IOPS… IOPS
• Latency
Author: D-Kuru/Wikimedia Commons Licence: CC-BY-SA-3.0-AT
Picking a machine
• CPU
Author: Mark Sze Licence: CC BY-NC-ND 2.0
Picking a machine• Memory
• Save some for page cache!
Author: brutalSoCal Licence: CC BY-NC-ND 2.0
On AWS• Ephemeral disks.
• Please don’t use EBS. Really.
• IOPS usually the problem
• Instance sizes:
• spinning disk: m1.large, m1.xlarge, m2.4xlarge
• ssd: m3.xlarge, c3.2xlarge, i2.*
Set up the machine
• Lots of documentation / talks about this
• Recommended reading: Datastax guide [1]
[1] http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html
Cluster configuration
A
C B
Priamcare and feeding of Cassandra on AWS
https://github.com/Netflix/Priam
Cluster Topology
• We use RF=3
• Ring balanced within datacenter
• Nodes alternate racks (or AZs)
Cluster Topology (Priam)
• Token assignments stored in a database
• Can takeover token in instance of node failure
Cluster Topology (Priam)
• Priam assigns tokens evenly per region
• Alternates AZs within region
az1
az3
az2
az1
az2
az3
Autoscaling groups
• Recover from lost instance
• We don't use it for scaling with traffic
Important: Need one ASG per AZ
east-1a east-1a east-1a
east-1b east-1beast-1b
east-1ceast-1c east-1c
ASG size: 9
Important: Need one ASG per AZ
ASG size: 9
east-1a east-1a east-1a
east-1b east-1beast-1b
east-1ceast-1c
east-1b
Important: Need one ASG per AZ
ASG-1a size: 3 east-1a east-1a east-1a
east-1b east-1beast-1b
east-1ceast-1c
ASG-1b size: 3
ASG-1csize: 3 east-1c
Backups
• Data on ephemeral disks
• Guard against application errors
• SSTables immutable -> ship to S3
• Priam does this
Restore
• Have to be able use your backup
• Also useful for QA / test
• Priam handles this rather nicely
Deployed!Time to chill?
https://www.flickr.com/photos/spunkinator/2394514059 Creative Commons
Monitoringworking / not working doesn’t count.
We have our own custom reporter agent for Datadog There’s pluggable reporter support in 2.0.2 now.
JVM GC woes
JVM GC woesAll happy now
SSTables Read Histogram
Questions?before we carry on
Transition takestime mindset shift expertise (some) risk
Our experience
• Pick one feature first
• Mindset shift
• Data modeling consulting
• Libraries / Patterns / Data-as-a-service
Pick one feature
• Don’t go all in with Cassandra with something important right away
• Work closely with that team
You probably will make mistakes
Oops!
Mindset shift
• Everyone knows SQL
• Not everyone knows Cassandra / NoSQL
• Need to know queries beforehand
Enrollment Example
• Learners enroll into a course
• learner (many-to-many) course
• Need to keep track of this membership
MySQL ModelCREATE TABLE `courses_learners` (
`id` INT(11) NOT NULL auto_increment,
`course_id` INT(11) NOT NULL,
`learner_id` INT(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `c_l` (`learner_id`, `course_id`),
CONSTRAINT `ref1` FOREIGN KEY (`course_id`)
CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)
)
MySQL ModelCREATE TABLE `courses_learners` (
`id` INT(11) NOT NULL auto_increment,
`course_id` INT(11) NOT NULL,
`learner_id` INT(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `c_l` (`learner_id`, `course_id`),
CONSTRAINT `ref1` FOREIGN KEY (`course_id`)
CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)
)
MySQL ModelCREATE TABLE `courses_learners` (
`id` INT(11) NOT NULL auto_increment,
`course_id` INT(11) NOT NULL,
`learner_id` INT(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `c_l` (`learner_id`, `course_id`),
CONSTRAINT `ref1` FOREIGN KEY (`course_id`)
CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)
)
MySQL ModelCREATE TABLE `courses_learners` (
`id` INT(11) NOT NULL auto_increment,
`course_id` INT(11) NOT NULL,
`learner_id` INT(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `c_l` (`learner_id`, `course_id`),
CONSTRAINT `ref1` FOREIGN KEY (`course_id`)
CONSTRAINT `ref2` FOREIGN KEY (`learner_id`)
)
Cassandra Style
CREATE TABLE courses_by_learner (
learner_id uuid,
course_id uuid,
PRIMARY KEY (learner_id, course_id)
)
Data modeling consulting
• Build core team proficient at C* data modeling
• Available to consult for trickier use cases
Libraries / Patterns• Abstract away simple (but common) use-cases
• Key-value storage
• Simple time series
• Maybe every developer won’t need deep C* knowledge?
• More radical: data as a service (e.g. STAASH)
STAASH: https://github.com/Netflix/staash
It’s a long roadbut we’ll get there…
Author: Carissa Rogers License: CC BY 2.0
Conclusion
• Know Cassandra
• Know what makes a good deployment
• Know that new skills have to be acquired