+ All Categories
Home > Software > Cassandra for mission critical data

Cassandra for mission critical data

Date post: 12-Apr-2017
Category:
Upload: oleksandr-semenov
View: 243 times
Download: 0 times
Share this document with a friend
51
Apache Cassandra for mission critical data OLEKSANDR SEMENOV
Transcript
Page 1: Cassandra for mission critical data

Apache Cassandra for mission critical dataOLEKSANDR SEMENOV

Page 2: Cassandra for mission critical data

Agenda1) CAP Theorem2) NoSQL vs RDBMS: advantages and disadvantages3) What is Cassandra? History.4) Cassandra features5) Cassandra datamodel6) Ways to access data: Thrift, CQL, Kundera ORM

Page 3: Cassandra for mission critical data

What is NoSQLNoSQL Not SQL

does not mean

NoSQL Not Only SQL ORNot Relational Database

it means

Page 4: Cassandra for mission critical data

CAP Theorem You can choose only two: Consistency, Availability, Partition tolerance

Page 5: Cassandra for mission critical data
Page 6: Cassandra for mission critical data

Choosing AP data storages

Cassandra is an AP storage

Page 7: Cassandra for mission critical data

RDBMS+ Strong mathematical basis+ Referential Integrity+ ACID transactions+ Standard SQL+ Well-known approaches to data modeling- Poor performance at great data amounts- Scaling issues

Page 8: Cassandra for mission critical data

NoSQL+ Great performance+ Flexible data schema+ Easy scaling- Data redundancy- Integrity should be ensured by developer in most cases- Different access interfaces for different stores- Paradigm shift required- BASE consistency model instead of ACID transactions

Page 9: Cassandra for mission critical data

ACID consistency model

Atomicity• Transaction

s are all or nothing

Consistency• Data written

is valid according all rules:

Isolation• Transaction

s do not affect each other

Durability• Data written

will not be lost

Page 10: Cassandra for mission critical data

BASE consistency model

Page 11: Cassandra for mission critical data

BASE system example

Page 12: Cassandra for mission critical data

What is Cassandra? Cassandra is a:• non-relational• highly-scalable• decentralized• eventually consistent key-multivalue storage

Page 13: Cassandra for mission critical data

History

Page 14: Cassandra for mission critical data

Who uses Cassandra?

Page 15: Cassandra for mission critical data
Page 16: Cassandra for mission critical data

Cassandra Features

Decentralized• each node

has the same role and can process any request

Replication• Cassandra

supports multi -datacenter replication

Scalable• read and

write throughput both increase linearly as new machines are added

Durable• data write

once will survive in case of hardware failure

Page 17: Cassandra for mission critical data

Cassandra Features

Fault-tolerant• data is

automatically replicated to multiple nodes for fault-tolerance

Tunable consistency• you can

choose desired consistency level

CQL• SQL-like

query language

Very fast IO• Both reads

and writes are very fast

Page 18: Cassandra for mission critical data

Availability: partitioning with SPOF

Page 19: Cassandra for mission critical data

Availability: Cassandra & no SPOF

• Each node can act as router

• Data is replicated to several nodes according to replication factor

Page 20: Cassandra for mission critical data

Replication Factor

Replication Factor = 3

Page 21: Cassandra for mission critical data

Availability

Page 22: Cassandra for mission critical data

Tunable consistency

Consistency can be set on per-operation basis

Page 23: Cassandra for mission critical data

Write path in Cassandra• Data is written to any node called coordinator

• Data is written to commitlog(for durability) and then to memTable

• MemTable is flushed to disk(SSTable) periodically, it is recreated in memory

• Deletes are special cases of writes - tombstones

Page 24: Cassandra for mission critical data

Read path in Cassandra• Any server can be queried, it acts as coordinator

• Contacts node with requested key

• If consistency < ALL, read repair is performed on background

Read at consistency level = ONE

Page 25: Cassandra for mission critical data

Read repair• Read repair means that when a query is made against a given key, we

perform a digest query against all the replicas of the key and push the most recent version to any out-of-date replicas.

Page 26: Cassandra for mission critical data

Cassandra datamodel Keyspace

ColumnFamily

Columns SuperColumns

Database

Table

Columns

RDBMS Cassandra

Page 27: Cassandra for mission critical data

ColumnFamilies usage patterns

Static

Dynamic

Page 28: Cassandra for mission critical data

Columns Column – is a tuple which contains 3 fields: name, value and timestamp

Page 29: Cassandra for mission critical data

Special column types• Expiring Columns –

column with auto-removal• Counter columns –

columns with auto-increment.

• SuperColumns – columns, which contain other columns. Deprecated.

Page 30: Cassandra for mission critical data

SuperColumns

Page 31: Cassandra for mission critical data

Indexes• Primary index – index built by key of the each row• Secondary index – index on column values,

should be created manually. Good only for low cardinality columns. Example: columns Gender can have only two values: M and F.And it is a problem.

• Indexing is performed in background

Page 32: Cassandra for mission critical data

Data modelling• Query-driven approach is

required• How to get data if I can

query only by key?• Denormalize it!• Create multiple tables for

data• Use fast writes to do few

reads as possible

Page 33: Cassandra for mission critical data

What Cassandra is good for?

Time series data (logs, sensor data) Write intensive applications

Applications with

predefined query-model

Page 34: Cassandra for mission critical data

Never use Cassandra• If you want to replace traditional RDBMS with it.

• If you can’t tell in which way your data will be queried

• If you have a lot of reads

• If strong consistency is required (financial, medical areas)

• Cassandra is not a silver-bullet solution

Page 35: Cassandra for mission critical data
Page 36: Cassandra for mission critical data

Ways to access data

Thrift• First & native

client. Deprecated.

Hector, Pelops• Libraries

based on Thrift

CQL• SQL-like

language, very limited

Kundera• ORM/ONM

framework

Page 37: Cassandra for mission critical data

Thrift• Apache Thrift – framework for cross-language

services development• Supported languages: C++, Java, Python, PHP,

Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Smalltalk, OCaml and others.

• Was developed by Facebook and released in 2007• Deprecated

Page 38: Cassandra for mission critical data

Hector

• Hector - is a high level Java client for Apache Cassandra currently in use on a number of production systems.

• Includes an incredible number of features

Page 39: Cassandra for mission critical data

Hector main features• Security – connection using Kerberos• Speed4j monitoring library integrating capabilities• Hector Object Mapper – simple ORM(not

compliant with JPA )• Connection pooling• Failover behavior on client side

Page 40: Cassandra for mission critical data

CQLCQL – a SQL-like language introduced in Cassandra 0.8Offers next functionality:• No JOINS• Creating/dropping keyspaces, column families,

columns and rows• Inserting/retrieving columns• Indexing

Page 41: Cassandra for mission critical data

Kundera ORM

Kundera is a “Polyglot Object Mapper” Supports:

◦ Cassandra◦ HBase◦ MongoDB◦ RDBMS◦ and other

Page 42: Cassandra for mission critical data

Kundera ORM

JPA 2.1 compliantSupports cross-

datastore-persistance

Supports many-to-many relationships

Allows to add any NoSQL support by

implementing Client Extension

Page 43: Cassandra for mission critical data

Performance Comparison

Benchmarked on Amazon Ubuntu large instance:◦ 7.5 GB memory◦ 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute

Units each)◦ 64-bit platform

Page 44: Cassandra for mission critical data

Performance Comparison

Number Of Threads (1 record) Pelops Time (in sec) Hector Time (in sec) Kundera (in sec)

10 0.148 0.100 0.117

100 0.350 0.363 0.361

1000 1.793 1.885 2.180

10000 11.478 11.480 14.262

40000 38.887 37.241 41.977

50000 48.646 47.749 49.285

100000 91.280 92.874 97.707

Concurrent load – 1 record per thread

Page 45: Cassandra for mission critical data

Performance Comparison

10 100 1000 10000 40000 50000 1000000

20

40

60

80

100

120

Concurrent load - 1 record for each thread

Pelops

Hector

Kundera

Threads number

Tim

e, s

Page 46: Cassandra for mission critical data

Performance ComparisonConcurrent + Bulk load – 1000 record per thread

Number Of Threads (1000 rec/ thread) Pelops Time (in sec) Hector Time (in sec) Kundera (in sec)

10 5.929 5.286 7.722

100 34.750 32.228 39.124

1000 368.022 352.711 393.931

Page 47: Cassandra for mission critical data

Performance Comparison

10 100 10000

200

400

600

800

1000

1200

Concurrent + Bulk load – 1000 record per thread

Kundera

Hector

Pelops

Thread number

Tim

e, s

Page 48: Cassandra for mission critical data

Cassandra limitations

The key (and column names) must < 64K

bytes.

The maximum number of column per row is 2 billion.

A single column value may not be larger

than 2GB.

All data read should fit in memory due to

Thrift streaming support lack

Page 49: Cassandra for mission critical data

SummaryGreat I/O performance

Several data access interfaces

AP data store (CAP)

Production ready & production proved

Good for time series data

Extremely available

Page 51: Cassandra for mission critical data

Thank you!


Recommended