+ All Categories
Home > Technology > Cassandra

Cassandra

Date post: 14-Dec-2014
Category:
Upload: kspichale
View: 467 times
Download: 4 times
Share this document with a friend
Description:
Wide Column Store for BigData
Popular Tags:
14
APACHE CASSANDRA Wide Column Store for Big Data Kai Spichale
Transcript
Page 1: Cassandra

APACHE CASSANDRA

Wide Column Store for Big Data

Kai Spichale

Page 2: Cassandra

Outline

Motivation

Introduction to Cassandra

Big Data Solution

Page 3: Cassandra

„Must Haves“ for Big Data?

What do modern businesses need for big data?

A scalable high-performance database

that is easy to use and

cost effective Scalable

Performance

CostEffective

OperationalEase

Page 4: Cassandra

„Must Haves“ for Big Data?

„Modern businesses need to be able to manage large

volumes of realtime data and run analytic and enterprise

search operations on that same data as quickly as possible

to make business decisions.“

Real-Time

Databases

Data MovementETL Process

Analytic/Search

Databases

Page 5: Cassandra

Legacy RDBMS ≠ Big Data

„Big data is comprised of (1) Velocity – how fast the data is coming in;

(2) Variety – all types are new being captured; (3) Volume – TB‘s to

PB‘s of data; (4) Complexity – mulit-location, data center, etc.“

“Big data technologies describe a new generation of technologies and

architectures, designed to economically extract value from very large

volumes of a wide variety of data, by enabling high-velocity capture,

discovery, and/or analysis.”

“Big data is data that exceeds the processing capacity of conventional

database systems. The data is too big, moves too fast, or doesn’t fit the

strictures of your database architectures. To gain value from this data,

you must choose an alternative way to process it.”

Page 6: Cassandra

Trends & Challenges in Data Mngt.

Exponential Data

Growth

More Connected

Data

Semi Structured

Data

Cloud

Key Value

Graph

Document

Wide Column

Page 7: Cassandra

Trends & Challenges in Data Mngt.

Exponential Data

Growth

More Connected

Data

Semi Structured

Data

Cloud

Key Value

Graph

Document

Apache

Cassandra

Page 8: Cassandra

Apache Cassandra

A massively scalable, decentralized, structured

data store (aka database).

Project history:

Page 9: Cassandra

Cassandra is…

A

B

C

DE

F

G

H

O(1) Distributed Hash Table

Sharding, Replication

Elastic

Fault tolerant

No Single Point of Failure

Durable

Nodes Token

A 0

B 4

C 8

D 12

E 16

F 20

G 24

H 28

Page 10: Cassandra

Cassandra is…

AP-System (CAP Theorem)

Eventual consistency

Tunable trade-offs:

Consistency vs. Latency

Choose between synchronous or asynchronous

replication for each update

A P

C

C = Consistency

A = High Availability

P = Partitioning Tolerance

Page 11: Cassandra

Cassandra is…

A BigTable Clone

No schema

Predestined for

Semi-structured data

Sparse data

Keyspace

Column Family

Key Row

Column Column

Key Row

Column

Key Row

Column Column Column

Column Family

Row

SuperColumn SuperColumn

Column Column Column Column

Row

SuperColumn

Column Column Column

Page 12: Cassandra

Cassandra-based Big Data

Solution

Real-time

Cassandra

Real-time

Cassandra

Search

Solr

Search

Solr

Search

Solr

Analytics

Hadoop

Analytics

Hadoop

Real-time

Cassandra

Real-time queries with

Cassandra

Distributed Search with

Solr

Analytics with Hadoop

MapReduce

Cassandra Cluster

(Replication)

Page 13: Cassandra

Summary

Apache Cassandra is a elastic scalable, fault-

tolerant data store

Tunable consistency levels

Wide Column: flexible datamodel without schema

Supports: real-time queries, analytics through

Hadoop integration, Solr-based fulltext search

Page 14: Cassandra

Thank you!

Q&A


Recommended