+ All Categories
Home > Technology > Cassandra implementation for collecting data and presenting data

Cassandra implementation for collecting data and presenting data

Date post: 10-May-2015
Category:
Upload: robert-chen
View: 253 times
Download: 2 times
Share this document with a friend
Description:
Cassandra implementation for collecting data and presenting data
Popular Tags:
27
Cassandra implementation for collecting data and presenting data Robert Chen [email protected]
Transcript
Page 1: Cassandra implementation for collecting data and presenting data

Cassandra implementation for collecting data and

presenting data

Robert [email protected]

Page 2: Cassandra implementation for collecting data and presenting data

Agenda• SQL vs NOSQL• Why Cassandra• Cassandra introduction• Our architecture and design• Configuration best practice• How we write data• How we read data • Demo

Page 3: Cassandra implementation for collecting data and presenting data

A highly scalable, eventually consistent, distributed, structured key-value store.

Cassandra™ is the highly scalable and high performance distributed data infrastructure. Offering distribution of data across multiple data centers and incremental scalability with no single points of failure, Cassandra is the logical choice when you need reliability without compromising performance. Cassandra is relied upon by

leading companies like Netflix, Twitter, Cisco, Rackspace, Ooyala, Openwave, and many more.

Page 4: Cassandra implementation for collecting data and presenting data

SQL vs NOSQL• NOSQL

• Not just SQL, schema free• Big data• NOSQL can service heavy read/write workloads • Probably not consistent in real time read

• SQL• Can support complex join relationship• Oracle RAC solution for big data? Too expensive• Typical RDBMS implementations are tuned for small but frequent read/write transactions or for

large batch transactions with rare write access • RDBMSs (they say) have shown poor performance on data-intensive applications, including:

• Indexing a large number of documents• Serving pages on high-traffic websites• Handling the volumes of social networking data• Delivering streaming media

• Consistent in all read

Page 5: Cassandra implementation for collecting data and presenting data

Why Cassandra• To solve our central netapp filer storage bottleneck issue• Choose cassandra instead of Hbase

• No Single point of failure• Fast development

• Big data and dynamically changing environment • Good fit for horizontally production environment• Low total cost of ownership

• No special hardware needed, just some x86 boxes

Page 6: Cassandra implementation for collecting data and presenting data

Cassandra Design • High availability (A wily hare has three burrows )• Eventual consistency

• trade-off strong consistency in favor of high availability• allows you to choose strong consistency or allow varying degress of more relaxed consistency

• Incremental scalability(linearly scalable), Horizontal!• Nodes added to a Cassandra cluster (all done online) increase the throughput of your database

in a predictable, linear fashion for both read and write operations

• Optimistic Replication•

Page 7: Cassandra implementation for collecting data and presenting data

Cassandra Design II

• All nodes are identical: decentralized/symmetric• No master or SPOF• Adding is simple• Distributed, read/write anywhere design

• Massively scalable peer-to-peer architecture• Based on the best of Amazon Dynamo and Google BigTable

• Minimal administration• Multi-datacenter replication• No caching layer required

Page 8: Cassandra implementation for collecting data and presenting data

Cassandra Design III• very fast writes• fault tolerant, Guaranteed data safety• automatic provisioning of new nodes• big data• Transparent fault detection and recovery

• Cassandra utilizes gossip protocols to detect machine failure and recover when a machine is brought back into the cluster – all without your application noticing.

Page 9: Cassandra implementation for collecting data and presenting data

write op

Page 10: Cassandra implementation for collecting data and presenting data

Write op (continue)

• Writes go to log and memory table• Periodically memory table merged with disk table

Cassandra node

Disk

RAM

Log SSTable file

Memtable

Update

(later)

Page 11: Cassandra implementation for collecting data and presenting data

Read

Query

Closest replica

Cassandra Cluster

Replica A

Result

Replica B Replica C

Digest QueryDigest Response Digest Response

Result

Client

Read repair if digests differ

Page 12: Cassandra implementation for collecting data and presenting data

Configuration best practice• Put the data files on good performance RAID volumes• Start with Sun JDK 1.6+• Configure with Java Native libs• The clocks on each node must be synchronized to maintain precision

across the cluster on inserts.

Page 13: Cassandra implementation for collecting data and presenting data

Data collection Architecture

Web UI (High Chart/ JQuery)

Active MQ (Message Bus)

1. collect data sent to Active MQ

2. Consume data, save to Cassandra

3. Filer the data, showing on the plots

Page 14: Cassandra implementation for collecting data and presenting data

Data structure

keyspace

settings (eg,

partitioner)

column family

settings (eg, comparator, type [Std])

columnname value clock

Page 15: Cassandra implementation for collecting data and presenting data

Company Logo

Our Data Model

CoreMetrics (keyspace)

LoadAvg1 (Column family)

host1_131696(row)

Column:6449, value: 0.04

Column:5546, value: 0.02

host2_131811(row)

Column:8227, value: 0.46

Column:9792, value: 1.30

Page 16: Cassandra implementation for collecting data and presenting data

Company Logo

Our Data Model

CoreMetrics (keyspace)

Primary (Column family)

host1:loadAvg1 (row)

Column:1316966449, value: 0.04 Column:1316965546, value: 0.02

host2:loadAvg1 (row)

Column:1318118227, value: 0.46 Column:1318119792, value: 1.30

Page 17: Cassandra implementation for collecting data and presenting data

Company Logo

Our Meta Data Model

CoreMetrics (keyspace)

PrimaryMeta (Column family)

host1.com (row)

Column:loadAvg15:Total value: 1

Column:loadAvg15:Total value: 1

host2 (row)

Column:loadAvg15:Total value: 1 Column:loadAvg15:Total value: 1

Page 18: Cassandra implementation for collecting data and presenting data

Company Logo

Our Hbase Data Model

Primary (Column family)

host1:loadAvg1:1 (row: host:metric:instance)

Column:c:1316966449, value: 0.04 Column:c:1316965546, value: 0.02

host2:loadAvg1:1 (row: host:metric:instance)

Column:1318118227, value: 0.46 Column:1318119792, value: 1.30

Page 19: Cassandra implementation for collecting data and presenting data

Company Logo

Our Data Model (II)• Keyspace: CoreMetrics (database name), one per application

• Column families: (metrics, each metric is a column family)

• loadAvg1• loadAvg5• etc (About 80 server metrics)

• Rows and columns: inspired by the design of Hbase and opentsdb, we use the similar way to design our rows and columns:

separate timestamp into row and column keys, which improve tremendously the reading performance

Page 20: Cassandra implementation for collecting data and presenting data

How we write to cassandraMultiple data loaders connect to cassandra nodes 9160 port and insert data like this:

$CLIENT = new Cassandra::CassandraClient($PROTOCOL);

$CLIENT->set_keyspace($keyspace);

$CLIENT->insert($rowkey, $column_parent, $column, $consistency_level);

Page 21: Cassandra implementation for collecting data and presenting data

How we read data from cassandra

Using pycassa to multiget of the rows and do some aggregation if too many data points returns.

get_coremetrics(metric_name, host, stime, etime, samples = 1000):

Page 22: Cassandra implementation for collecting data and presenting data

Company Logo

Demo: data model view

Page 23: Cassandra implementation for collecting data and presenting data

Company Logo

Demo: graphing the data

Page 24: Cassandra implementation for collecting data and presenting data

Cassandra monitoring

1.Nagios plugin for cassandra2.JMX

Page 25: Cassandra implementation for collecting data and presenting data

Thoughts and future

1.Migrate more applications to Cassandra2.Livestat data (Bids/Listings…)3.Help other team to do data collection and graphing?

Page 26: Cassandra implementation for collecting data and presenting data

Reference URLs

• Thrift (12 language bindings!)• http://wiki.apache.org/cassandra/ThriftInterface • http://thrift.apache.org/download/

• Pycassa• http://pycassa.github.com/pycassa/tutorial.html

Page 27: Cassandra implementation for collecting data and presenting data

Recommended