Date post: | 13-Apr-2017 |
Category: |
Technology |
Upload: | jonathan-shook |
View: | 247 times |
Download: | 2 times |
Balancing Hardware Cost and Performance
Meta Risks and Remedies in C* Adoption
Note: If you are reading this on slideshare, read the speaker notes for details on each slide.
Sizing & Selection
2
Risk
“One Size Fits All” purchasing habits limit
hardware selection.
3
Remedy
Be willing to adapt your hardware to the workload.
Risk Remedy
Maintain operational headroom to bolster performance through
spikes, failures, and sizing.
“Perfect Sizing” a cluster side-steps operational
needs.
4
Risk Remedy
Test with a replication factor and consistency level which
are realistic for your application requirements.
Focusing too much on RF=1 performance ignores
realistic CAP scenarios.
5
Risk Remedy
Build a test harness that can be used to iterate your design with improvements.
No realistic performance testing before production
leaves you hoping for success.
6
Risk Remedy
Focus design and testing effort on realistic issues by testing performance on the same type of hardware that will be used in production.
Performance testing on “dev” hardware makes
people focus in the wrong place.
7
Risk Remedy
Understand how cache-dependent your workload is,
and allocate resources around performance
demands.
Buffer-cache cycling pushes the system towards cold-
read performance.
8
Risk Remedy
Configure the storage subsystem to support the IO demands of your workload.
Underestimating the role that storage plays can lead to risky assumptions about
performance.
9
Risk Remedy
Be aware of RAID trade-offs, especially with SSDs. Keep it as simple and fast
as possible. Measure RAID performance trade-offs,
including in degraded/rebuilding mode.
RAID shenanigans make the storage subsystem slower or less available.
10
Risk Remedy
Make empirical decisions about using SSDs vs.
spindles. 10x performance at ~2x the cost is usually an
easy choice.
Clinging to historic maxims about first-gen SSDs can make you spend much
more for the same capacity on spindles.
11
Risk Remedy
Default num_tokens is often too high (256).
12
Set num_tokens to a lower value, like 10 or 20.
Risk Remedy
Share Nothing.Shared Something creates an achilles heel in your
architecture.
13
Balancing Costs and Performance, In General
Let’s take a wider view of cost and performance.
Data Modeling
15
Risk Remedy
Use tracing tools to identify costly operations. Refine data models and queries around best practices.
Trying to emulate RDBMS patterns with inefficient
operations fails to scale.
16
Risk Remedy
Watch performance and data density trends. Be
ready to scale before the capacity is needed.
Some queries are sensitive to response size,
tombstones, or overall data density.
17
Risk Remedy
Understand exactly why denorm is the norm for
highly-available distributed systems.
Failure to embrace denormalization can hinder
development progress.
18
Risk Remedy
Make thoughtful choices around compaction strategies. Test and compare with write sampling if unsure.
Inappropriate compaction tuning can rob a system of
needed IO to support queries.
19
Operations
20
Risk Remedy
Feed your C* metrics to a highly-available monitoring system. Develop a rationale for consuming and refining
your operational views.
Operational awareness may need to be improved.
21
Risk Remedy
Discuss and plan for what to do when you have a
node outage. Test the plan on actual hardware under
realistic load.
Operational readiness may need to be improved.
22
Risk Remedy
Use orchestration or at least some form of automation to
script your deployments. Put your configs in source
control.
Deployments are not reproducible.
23
Risk Remedy
Use your Cassandra nodes only for Cassandra and
integrated components, like Spark or SOLR.
Multipurpose use of systems force resource
contention with C* workloads.
24
Risk Remedy
Use your metrics to understand workload peaks, valleys, cycles, and ongoing trends. Ensure operational headroom for your highest
loads.
Not enough forward planning around growth or usage trends can lead to
resource overload.
25
Design Process
26
Risk Remedy
Be willing to measure and validate key assumptions from early in the process,
and to refine when needed.
Strictly Waterfall thinking tries to solve too many
problems up front without useful data, leading to risky
assumptions.
27
Risk Remedy
Spend some time upfront looking at community
resources and examples.
Reinventing the wheel is time and effort consuming,
as well as error prone.
28
Risk Remedy
Design the system as a whole, with cross-team
collaboration as needed.
Insular division of design ownership leads to partially-informed design decisions.
29
Risk Remedy
Read the docs and use examples when learning
how to use the client APIs. Pay attention to object
scopes.
Misusing the client API can lead to suboptimal
performance or confusing symptoms.
30
Risk Remedy
Make sure that all the learning, diagnostic, and
development resources are publicized among your
teams.
Lack of tooling awareness makes diagnosis and
measurement a last minute learning process.
31
Risk Remedy
Plan to assemble a prod-like workflow even in early stages. Run the full stack,
including tooling, throughout all phases of development.
Operational tools which are delivered or deployed later
in adoption have less refinement and testing time
with users.
32
Risk Remedy
Make using prepared statements a design
standard.
Not using prepared statements will negatively
impact throughput.
33
Q & A34