of 13
8/6/2019 Cassandra Anti-Patterns (in 5m)
1/13
Cassandra Anti-Patterns (in 5m)Matthew F. Dennis // @mdennis
8/6/2019 Cassandra Anti-Patterns (in 5m)
2/13
Non-Sun (err, Non-Oracle) JVM
No OpenJDK
No Blackdown (anyone still use this?)
Etc, etc, etc; just use the Sun (Oracle) JVM
At least u22, but in general the latest release(unless you have specific reasons otherwise)
8/6/2019 Cassandra Anti-Patterns (in 5m)
3/13
CommitLog+Data On The Same Disk
Don't put the commit log and data directories onthe same set of spindles
commit log gets a single spindle entirely to itself (standardconsumer SATA disks easily sustain > 80 MB/s insequential writes)
DOES NOT APPLY TO SSDS or EC2
SSDs have no seek time
EC2 ephemeral drives are still virtualized (but not thesame as EBS)
On EC2 or SSDs: use one RAID set for both thecommit log and data directories
8/6/2019 Cassandra Anti-Patterns (in 5m)
4/13
EBS volumes on EC2
Sounds great, nice feature set, but
Not predictable
freezes are common
Throughput limited in many cases
Use ephemeral drives instead
Stripe them
Both commit log and data directory on the sameraid set
8/6/2019 Cassandra Anti-Patterns (in 5m)
5/13
Oversized JVM heaps
6 8 GB is good (assuming sufficient ram onyour boxen)
10 12 GB is possible and in some
circumstances correct 16GB == max JVM heap size
> 16GB => badness
JVM heap ~= boxen RAM => badness (always)
8/6/2019 Cassandra Anti-Patterns (in 5m)
6/13
JVM heap size -v- GC suckage
GCSuc
kage
JVM heap size
~6GB
~10GB
~16GB
8/6/2019 Cassandra Anti-Patterns (in 5m)
7/13
Largebatchmutations(large in number of distinct rows)
Timeout / failure => entire mutation must beretried => wasted work
Larger mutations => higher likely hood of
timehood 1000 mutations to perform? Do 100 batches of
10 in parallel instead of one batch of 1000
Exact number or rows/batch is variabledepending on HW, network, load, etc;experiment! (10-100 is a good starting point)
8/6/2019 Cassandra Anti-Patterns (in 5m)
8/13
OPP / BOP partitioner
You probably shouldn't use it
No really, you almost certainly shouldn't use it
Creates hot spots
Requires baby sitting from ops
Not as well tested nor is it widely deployed
8/6/2019 Cassandra Anti-Patterns (in 5m)
9/13
C* auto selection of tokens
Always specify your initial token.
Auto select doesn't do what you think it doesnor does it do what you want
loadbalance is even worse, it doesn't currently do whatyou think, what you want or what it claims; F#@* mycluster would be a much more apt name thanloadbalance
Future (next?) release of OPSC will remove yourbalancing woes
8/6/2019 Cassandra Anti-Patterns (in 5m)
10/13
Super Columns
10 15 percent performance penalty on reads and writes
Easier / better to use to composite columns
0.8.x makes this a lot easier
Done manually in 0.7.x and is still better
Devs working in C* code despise (loathe?) them API probably won't be deprecated, but implementation will be
replaced behind the seen with composites (may be ok at that pointto use them, but should probably just use composite API direclty)
Cassandra and DataStax is committed to maintain the API going
forward, even if the implementation changes
8/6/2019 Cassandra Anti-Patterns (in 5m)
11/13
Read Before Write
Race conditions
Abuses/Thrashes cache (row, key and page)
Increases latency
Increases IO requirements (by a lot)
Increases size in the client
8/6/2019 Cassandra Anti-Patterns (in 5m)
12/13
Winblows
Try to avoid it, you'll be happier
Not always possible? Then, I'm sorry for your pain
Run 'nix (in particular, probably Linux)
Easier to get help (IRC, email, meetups, etc)
C* performs better
Better tested
Cheaper Wider deployed (by a lot)
8/6/2019 Cassandra Anti-Patterns (in 5m)
13/13
Cassandra Anti-Patterns
Matthew F. Dennis // @mdennis
Q?