Balancing cost and performance with Apache Cassandra

Balancing Hardware Cost and Performance

Meta Risks and Remedies in C* Adoption

Note: If you are reading this on slideshare, read the speaker notes for details on each slide.

Sizing & Selection

2

Risk

“One Size Fits All” purchasing habits limit

hardware selection.

3

Remedy

Be willing to adapt your hardware to the workload.

Risk Remedy

Maintain operational headroom to bolster performance through

spikes, failures, and sizing.

“Perfect Sizing” a cluster side-steps operational

needs.

4

Risk Remedy

Test with a replication factor and consistency level which

are realistic for your application requirements.

Focusing too much on RF=1 performance ignores

realistic CAP scenarios.

5

Risk Remedy

Build a test harness that can be used to iterate your design with improvements.

No realistic performance testing before production

leaves you hoping for success.

6

Risk Remedy

Focus design and testing effort on realistic issues by testing performance on the same type of hardware that will be used in production.

Performance testing on “dev” hardware makes

people focus in the wrong place.

7

Risk Remedy

Understand how cache-dependent your workload is,

and allocate resources around performance

demands.

Buffer-cache cycling pushes the system towards cold-

read performance.

8

Risk Remedy

Configure the storage subsystem to support the IO demands of your workload.

Underestimating the role that storage plays can lead to risky assumptions about

performance.

9

Risk Remedy

Be aware of RAID trade-offs, especially with SSDs. Keep it as simple and fast

as possible. Measure RAID performance trade-offs,

including in degraded/rebuilding mode.

RAID shenanigans make the storage subsystem slower or less available.

10

Risk Remedy

Make empirical decisions about using SSDs vs.

spindles. 10x performance at ~2x the cost is usually an

easy choice.

Clinging to historic maxims about first-gen SSDs can make you spend much

more for the same capacity on spindles.

11

Risk Remedy

Default num_tokens is often too high (256).

12

Set num_tokens to a lower value, like 10 or 20.

Risk Remedy

Share Nothing.Shared Something creates an achilles heel in your

architecture.

13

Balancing Costs and Performance, In General

Let’s take a wider view of cost and performance.

Data Modeling

15

Risk Remedy

Use tracing tools to identify costly operations. Refine data models and queries around best practices.

Trying to emulate RDBMS patterns with inefficient

operations fails to scale.

16

Risk Remedy

Watch performance and data density trends. Be

ready to scale before the capacity is needed.

Some queries are sensitive to response size,

tombstones, or overall data density.

17

Risk Remedy

Understand exactly why denorm is the norm for

highly-available distributed systems.

Failure to embrace denormalization can hinder

development progress.

18

Risk Remedy

Make thoughtful choices around compaction strategies. Test and compare with write sampling if unsure.

Inappropriate compaction tuning can rob a system of

needed IO to support queries.

19

Operations

20

Risk Remedy

Feed your C* metrics to a highly-available monitoring system. Develop a rationale for consuming and refining

your operational views.

Operational awareness may need to be improved.

21

Risk Remedy

Discuss and plan for what to do when you have a

node outage. Test the plan on actual hardware under

realistic load.

Operational readiness may need to be improved.

22

Risk Remedy

Use orchestration or at least some form of automation to

script your deployments. Put your configs in source

control.

Deployments are not reproducible.

23

Risk Remedy

Use your Cassandra nodes only for Cassandra and

integrated components, like Spark or SOLR.

Multipurpose use of systems force resource

contention with C* workloads.

24

Risk Remedy

Use your metrics to understand workload peaks, valleys, cycles, and ongoing trends. Ensure operational headroom for your highest

loads.

Not enough forward planning around growth or usage trends can lead to

resource overload.

25

Design Process

26

Risk Remedy

Be willing to measure and validate key assumptions from early in the process,

and to refine when needed.

Strictly Waterfall thinking tries to solve too many

problems up front without useful data, leading to risky

assumptions.

27

Risk Remedy

Spend some time upfront looking at community

resources and examples.

Reinventing the wheel is time and effort consuming,

as well as error prone.

28

Risk Remedy

Design the system as a whole, with cross-team

collaboration as needed.

Insular division of design ownership leads to partially-informed design decisions.

29

Risk Remedy

Read the docs and use examples when learning

how to use the client APIs. Pay attention to object

scopes.

Misusing the client API can lead to suboptimal

performance or confusing symptoms.

30

Risk Remedy

Make sure that all the learning, diagnostic, and

development resources are publicized among your

teams.

Lack of tooling awareness makes diagnosis and

measurement a last minute learning process.

31

Risk Remedy

Plan to assemble a prod-like workflow even in early stages. Run the full stack,

including tooling, throughout all phases of development.

Operational tools which are delivered or deployed later

in adoption have less refinement and testing time

with users.

32

Risk Remedy

Make using prepared statements a design

standard.

Not using prepared statements will negatively

impact throughput.

33

Q & A34

Date post:	13-Apr-2017
Category:	Technology
Upload:	jonathan-shook
View:	247 times
Download:	2 times

Balancing cost and performance with Apache Cassandra

Technology