Cassandra Anti-Patterns (in 5m)

8/6/2019 Cassandra Anti-Patterns (in 5m)

1/13

Cassandra Anti-Patterns (in 5m)Matthew F. Dennis // @mdennis


2/13

Non-Sun (err, Non-Oracle) JVM

No OpenJDK

No Blackdown (anyone still use this?)

Etc, etc, etc; just use the Sun (Oracle) JVM

At least u22, but in general the latest release(unless you have specific reasons otherwise)


3/13

CommitLog+Data On The Same Disk

Don't put the commit log and data directories onthe same set of spindles

commit log gets a single spindle entirely to itself (standardconsumer SATA disks easily sustain > 80 MB/s insequential writes)

DOES NOT APPLY TO SSDS or EC2

SSDs have no seek time

EC2 ephemeral drives are still virtualized (but not thesame as EBS)

On EC2 or SSDs: use one RAID set for both thecommit log and data directories


4/13

EBS volumes on EC2

Sounds great, nice feature set, but

Not predictable

freezes are common

Throughput limited in many cases

Use ephemeral drives instead

Stripe them

Both commit log and data directory on the sameraid set


5/13

Oversized JVM heaps

6 8 GB is good (assuming sufficient ram onyour boxen)

10 12 GB is possible and in some

circumstances correct 16GB == max JVM heap size

> 16GB => badness

JVM heap ~= boxen RAM => badness (always)


6/13

JVM heap size -v- GC suckage

GCSuc

kage

JVM heap size

~6GB

~10GB

~16GB


7/13

Largebatchmutations(large in number of distinct rows)

Timeout / failure => entire mutation must beretried => wasted work

Larger mutations => higher likely hood of

timehood 1000 mutations to perform? Do 100 batches of

10 in parallel instead of one batch of 1000

Exact number or rows/batch is variabledepending on HW, network, load, etc;experiment! (10-100 is a good starting point)


8/13

OPP / BOP partitioner

You probably shouldn't use it

No really, you almost certainly shouldn't use it

Creates hot spots

Requires baby sitting from ops

Not as well tested nor is it widely deployed


9/13

C* auto selection of tokens

Always specify your initial token.

Auto select doesn't do what you think it doesnor does it do what you want

loadbalance is even worse, it doesn't currently do whatyou think, what you want or what it claims; F#@* mycluster would be a much more apt name thanloadbalance

Future (next?) release of OPSC will remove yourbalancing woes


10/13

Super Columns

10 15 percent performance penalty on reads and writes

Easier / better to use to composite columns

0.8.x makes this a lot easier

Done manually in 0.7.x and is still better

Devs working in C* code despise (loathe?) them API probably won't be deprecated, but implementation will be

replaced behind the seen with composites (may be ok at that pointto use them, but should probably just use composite API direclty)

Cassandra and DataStax is committed to maintain the API going

forward, even if the implementation changes


11/13

Read Before Write

Race conditions

Abuses/Thrashes cache (row, key and page)

Increases latency

Increases IO requirements (by a lot)

Increases size in the client


12/13

Winblows

Try to avoid it, you'll be happier

Not always possible? Then, I'm sorry for your pain

Run 'nix (in particular, probably Linux)

Easier to get help (IRC, email, meetups, etc)

C* performs better

Better tested

Cheaper Wider deployed (by a lot)


13/13

Cassandra Anti-Patterns

Matthew F. Dennis // @mdennis

Q?

Date post:	07-Apr-2018
Category:	Documents
Upload:	phil-kim
View:	217 times
Download:	0 times

Cassandra Anti-Patterns (in 5m)

Documents