Date post: | 22-Feb-2017 |
Category: |
Technology |
Upload: | amazon-web-services |
View: | 171 times |
Download: | 2 times |
Andrey Zaychikov, Solutions Architect, EMEA21.02.2017
Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS
Typical algorithm of choosing right options for NoSQL DB deployments
What we will cover today?
How these databases differs?
DynamoDB
Cloud-based Self-managed (EC2)Key-value Document-oriented
Graph
Cassandra
What is it?• Dynamo model database
+ CQL• Horizontally scalable• No single point of failure • Data is immutable and
stored in collections• JVM based• Lot of management work
is done in a background• Rely on gossip protocol
Main concerns of the customers
Schema & usage pattern
Geo distribution Background routines &
specific optimizations
How does it work?
Choosing instance & storage capacity: 80% Writes
• For most of the workloads (especially with 50/50 RW ratio) M4s with EBS is the best option
• For write-heavy workloads with high RPS requirements C4 with EBS should be considered
• When the performance requirements are high and the size of the dataset is relatively small you can use I2s with ephemeral storage
Choosing instance & storage capacity: 80% Reads
• For most of the workloads M4s with EBS is the good choice
• When the performance requirements are high and the size of the dataset is relatively small you can use I2s with ephemeral storage
• When performance requirements are high and dataset is large the best option will be to use R4s with different EBS flavors
FAQ: 2AZ cluster architecture
Hint: RetryPolicy for Cassandra Driver
FAQ
Cassandra backup / restore
Auto Scaling of Cassandra
clusters
Cassandra in Containers
- Restore procedure for the whole cluster can be complicated
- Restore for single node can be done
with EBS Snapshots
- Auto-scaling puts unpredictable
pressure on the cluster
- Scaling up is simple, but scaling down is
extremely complicated
- Makes sense only for test / dev
environments
FAQ: Troubleshooting
JVM Caching Compaction
Disks I/O CPU Memory
MongoDB
What is it?• Document-oriented
database• Horizontally scalable• HA is based on
master / slave replication
• Geo-distributed• Lots of management
work is done in a background
Main concerns of the customers
Schema & usage pattern
Geo distribution and performance
Data consistency & partition tolerance
How does it work?
Choosing instance & storage• MongoDB needs a lot of memory
and really fast disks so unless your dataset is quite big the best option will be either R3 or I2 (depending on the size of the dataset)
• If the dataset is big you should consider to use R4 with different EBS flavors
• For hidden nodes you use M4 with EBS as EBS snapshots would help you to backup data easily
FAQ: 2AZ cluster architecture
Best option: Replica Set in one AZ and Hidden member in another one.
FAQ
MongoDB backup / restore
Querying large amount of data
MongoDB consistency
- Hidden nodes with EBS and EBS
snapshots backups
- Design schema properly
- Avoid using MapReduce on
Master
- Lots of improvements where done but
there are some edge cases
FAQ: Troubleshooting
Mongos performance
Long running queries
Fragmentation
Disks I/O CPU Memory
CouchDB
What is it?• Document-oriented database
built on Dynamo model• Supports RESTful API• Eventual consistency• Lockless optimistic with
conflicts resolution• Horizontally scalable (with
constraints)• Offline-first database• Map reduce to prepare views
How it works?
Choosing instance & storage
FAQ: 2AZ cluster architecture• You should plan
replication schema on your own so it is your responsibility to check how it will behave in case of DR event
FAQ
Proper replication schema
Indexed views & its performance
Proxy for requests
Aerospike
What is it?• In-memory key-
value database• High and
constant performance
• Sharing-nothing architecture
• Geo-distributed (hash partitions)
• Master-slave replication
How does it work?
Choosing instance & storage• Aerospike is used when
the performance requirements are extreme. It needs a lot of memory and super fast disks. That is why EC2 with Ephemeral storage would be a first choice for Aerospike deployments.
FAQ: 2AZ cluster architecture• If one AZ goes down
depending on you replication factor you will still have a copy of data
• Aerospike will be able to add more nodes and replicate data to it without putting much pressure on the existing nodes
• It takes time to replicate data
FAQ
Aerospike backup / restore
Auto Scaling of Aerospike clusters
Aerospike in Containers
- Restore procedure for the whole cluster can be complicated
- Restore for single node can be done
with EBS Snapshots
- Auto-scaling puts unpredictable
pressure on the cluster
- Scaling up is simple, but scaling down is
complicated
- Does not make any sense
FAQ: Troubleshooting
Disks I/O CPU Memory
What is it?• Graph database• JVM based• Provides REST API • Two clustering modes:
HA cluster & Casual cluster
• Two types of nodes – Core nodes & Read replicas (RAFT protocol)
• Uses Cypher language for querying Neo4j Casual Clustering
How does it work?
Choosing instance & storage
FAQ: 2AZ cluster architecture• If AZ fails and the
master node was in it – new master election procedure is initiated
• Core nodes in Casual cluster mode vote by simple majority
• If majority is unavailable cluster becomes read-only
FAQ: Troubleshooting
JVM Page Caching
Disks I/O CPU Memory
NoSQL on EC2:Cost considerations
General cost considerations
Usage pattern (R/W)
RPS Size of the dataset
Traffic costs Object size Number of nodes
Cost: Performance / Size• If you want to be always cost
effective and efficient than deployment is a journey for you
• Consider EBS as main option for most of the workloads
• If your performance requirements are really high and the size of the dataset is relatively low – consider EC2 with ephemerals, overvise – go for EC2 with EBS
Sum up• There is no general solution for
all cases• Context matters and the
solution should follow the changing context
• Apps and code should be adapted to the way NoSQL DBs work
• Initial choice of the deployment options can be changed
• Best way to make initial choice of the deployment – PoC
Thank you!