+ All Categories
Home > Technology > Balancing cost and performance with Apache Cassandra

Balancing cost and performance with Apache Cassandra

Date post: 13-Apr-2017
Category:
Upload: jonathan-shook
View: 247 times
Download: 2 times
Share this document with a friend
34
Balancing Hardware Cost and Performance Meta Risks and Remedies in C* Adoption Note: If you are reading this on slideshare, read the speaker notes for details on each slide.
Transcript
Page 1: Balancing cost and performance with Apache Cassandra

Balancing Hardware Cost and Performance

Meta Risks and Remedies in C* Adoption

Note: If you are reading this on slideshare, read the speaker notes for details on each slide.

Page 2: Balancing cost and performance with Apache Cassandra

Sizing & Selection

2

Page 3: Balancing cost and performance with Apache Cassandra

Risk

“One Size Fits All” purchasing habits limit

hardware selection.

3

Remedy

Be willing to adapt your hardware to the workload.

Page 4: Balancing cost and performance with Apache Cassandra

Risk Remedy

Maintain operational headroom to bolster performance through

spikes, failures, and sizing.

“Perfect Sizing” a cluster side-steps operational

needs.

4

Page 5: Balancing cost and performance with Apache Cassandra

Risk Remedy

Test with a replication factor and consistency level which

are realistic for your application requirements.

Focusing too much on RF=1 performance ignores

realistic CAP scenarios.

5

Page 6: Balancing cost and performance with Apache Cassandra

Risk Remedy

Build a test harness that can be used to iterate your design with improvements.

No realistic performance testing before production

leaves you hoping for success.

6

Page 7: Balancing cost and performance with Apache Cassandra

Risk Remedy

Focus design and testing effort on realistic issues by testing performance on the same type of hardware that will be used in production.

Performance testing on “dev” hardware makes

people focus in the wrong place.

7

Page 8: Balancing cost and performance with Apache Cassandra

Risk Remedy

Understand how cache-dependent your workload is,

and allocate resources around performance

demands.

Buffer-cache cycling pushes the system towards cold-

read performance.

8

Page 9: Balancing cost and performance with Apache Cassandra

Risk Remedy

Configure the storage subsystem to support the IO demands of your workload.

Underestimating the role that storage plays can lead to risky assumptions about

performance.

9

Page 10: Balancing cost and performance with Apache Cassandra

Risk Remedy

Be aware of RAID trade-offs, especially with SSDs. Keep it as simple and fast

as possible. Measure RAID performance trade-offs,

including in degraded/rebuilding mode.

RAID shenanigans make the storage subsystem slower or less available.

10

Page 11: Balancing cost and performance with Apache Cassandra

Risk Remedy

Make empirical decisions about using SSDs vs.

spindles. 10x performance at ~2x the cost is usually an

easy choice.

Clinging to historic maxims about first-gen SSDs can make you spend much

more for the same capacity on spindles.

11

Page 12: Balancing cost and performance with Apache Cassandra

Risk Remedy

Default num_tokens is often too high (256).

12

Set num_tokens to a lower value, like 10 or 20.

Page 13: Balancing cost and performance with Apache Cassandra

Risk Remedy

Share Nothing.Shared Something creates an achilles heel in your

architecture.

13

Page 14: Balancing cost and performance with Apache Cassandra

Balancing Costs and Performance, In General

Let’s take a wider view of cost and performance.

Page 15: Balancing cost and performance with Apache Cassandra

Data Modeling

15

Page 16: Balancing cost and performance with Apache Cassandra

Risk Remedy

Use tracing tools to identify costly operations. Refine data models and queries around best practices.

Trying to emulate RDBMS patterns with inefficient

operations fails to scale.

16

Page 17: Balancing cost and performance with Apache Cassandra

Risk Remedy

Watch performance and data density trends. Be

ready to scale before the capacity is needed.

Some queries are sensitive to response size,

tombstones, or overall data density.

17

Page 18: Balancing cost and performance with Apache Cassandra

Risk Remedy

Understand exactly why denorm is the norm for

highly-available distributed systems.

Failure to embrace denormalization can hinder

development progress.

18

Page 19: Balancing cost and performance with Apache Cassandra

Risk Remedy

Make thoughtful choices around compaction strategies. Test and compare with write sampling if unsure.

Inappropriate compaction tuning can rob a system of

needed IO to support queries.

19

Page 20: Balancing cost and performance with Apache Cassandra

Operations

20

Page 21: Balancing cost and performance with Apache Cassandra

Risk Remedy

Feed your C* metrics to a highly-available monitoring system. Develop a rationale for consuming and refining

your operational views.

Operational awareness may need to be improved.

21

Page 22: Balancing cost and performance with Apache Cassandra

Risk Remedy

Discuss and plan for what to do when you have a

node outage. Test the plan on actual hardware under

realistic load.

Operational readiness may need to be improved.

22

Page 23: Balancing cost and performance with Apache Cassandra

Risk Remedy

Use orchestration or at least some form of automation to

script your deployments. Put your configs in source

control.

Deployments are not reproducible.

23

Page 24: Balancing cost and performance with Apache Cassandra

Risk Remedy

Use your Cassandra nodes only for Cassandra and

integrated components, like Spark or SOLR.

Multipurpose use of systems force resource

contention with C* workloads.

24

Page 25: Balancing cost and performance with Apache Cassandra

Risk Remedy

Use your metrics to understand workload peaks, valleys, cycles, and ongoing trends. Ensure operational headroom for your highest

loads.

Not enough forward planning around growth or usage trends can lead to

resource overload.

25

Page 26: Balancing cost and performance with Apache Cassandra

Design Process

26

Page 27: Balancing cost and performance with Apache Cassandra

Risk Remedy

Be willing to measure and validate key assumptions from early in the process,

and to refine when needed.

Strictly Waterfall thinking tries to solve too many

problems up front without useful data, leading to risky

assumptions.

27

Page 28: Balancing cost and performance with Apache Cassandra

Risk Remedy

Spend some time upfront looking at community

resources and examples.

Reinventing the wheel is time and effort consuming,

as well as error prone.

28

Page 29: Balancing cost and performance with Apache Cassandra

Risk Remedy

Design the system as a whole, with cross-team

collaboration as needed.

Insular division of design ownership leads to partially-informed design decisions.

29

Page 30: Balancing cost and performance with Apache Cassandra

Risk Remedy

Read the docs and use examples when learning

how to use the client APIs. Pay attention to object

scopes.

Misusing the client API can lead to suboptimal

performance or confusing symptoms.

30

Page 31: Balancing cost and performance with Apache Cassandra

Risk Remedy

Make sure that all the learning, diagnostic, and

development resources are publicized among your

teams.

Lack of tooling awareness makes diagnosis and

measurement a last minute learning process.

31

Page 32: Balancing cost and performance with Apache Cassandra

Risk Remedy

Plan to assemble a prod-like workflow even in early stages. Run the full stack,

including tooling, throughout all phases of development.

Operational tools which are delivered or deployed later

in adoption have less refinement and testing time

with users.

32

Page 33: Balancing cost and performance with Apache Cassandra

Risk Remedy

Make using prepared statements a design

standard.

Not using prepared statements will negatively

impact throughput.

33

Page 34: Balancing cost and performance with Apache Cassandra

Q & A34


Recommended