Date post: | 14-Dec-2014 |
Category: |
Technology |
Upload: | kspichale |
View: | 467 times |
Download: | 4 times |
APACHE CASSANDRA
Wide Column Store for Big Data
Kai Spichale
Outline
Motivation
Introduction to Cassandra
Big Data Solution
„Must Haves“ for Big Data?
What do modern businesses need for big data?
A scalable high-performance database
that is easy to use and
cost effective Scalable
Performance
CostEffective
OperationalEase
„Must Haves“ for Big Data?
„Modern businesses need to be able to manage large
volumes of realtime data and run analytic and enterprise
search operations on that same data as quickly as possible
to make business decisions.“
Real-Time
Databases
Data MovementETL Process
Analytic/Search
Databases
Legacy RDBMS ≠ Big Data
„Big data is comprised of (1) Velocity – how fast the data is coming in;
(2) Variety – all types are new being captured; (3) Volume – TB‘s to
PB‘s of data; (4) Complexity – mulit-location, data center, etc.“
“Big data technologies describe a new generation of technologies and
architectures, designed to economically extract value from very large
volumes of a wide variety of data, by enabling high-velocity capture,
discovery, and/or analysis.”
“Big data is data that exceeds the processing capacity of conventional
database systems. The data is too big, moves too fast, or doesn’t fit the
strictures of your database architectures. To gain value from this data,
you must choose an alternative way to process it.”
Trends & Challenges in Data Mngt.
Exponential Data
Growth
More Connected
Data
Semi Structured
Data
Cloud
Key Value
Graph
Document
Wide Column
Trends & Challenges in Data Mngt.
Exponential Data
Growth
More Connected
Data
Semi Structured
Data
Cloud
Key Value
Graph
Document
Apache
Cassandra
Apache Cassandra
A massively scalable, decentralized, structured
data store (aka database).
Project history:
Cassandra is…
A
B
C
DE
F
G
H
O(1) Distributed Hash Table
Sharding, Replication
Elastic
Fault tolerant
No Single Point of Failure
Durable
Nodes Token
A 0
B 4
C 8
D 12
E 16
F 20
G 24
H 28
Cassandra is…
AP-System (CAP Theorem)
Eventual consistency
Tunable trade-offs:
Consistency vs. Latency
Choose between synchronous or asynchronous
replication for each update
A P
C
C = Consistency
A = High Availability
P = Partitioning Tolerance
Cassandra is…
A BigTable Clone
No schema
Predestined for
Semi-structured data
Sparse data
Keyspace
Column Family
Key Row
Column Column
Key Row
Column
Key Row
Column Column Column
Column Family
Row
SuperColumn SuperColumn
Column Column Column Column
Row
SuperColumn
Column Column Column
Cassandra-based Big Data
Solution
Real-time
Cassandra
Real-time
Cassandra
Search
Solr
Search
Solr
Search
Solr
Analytics
Hadoop
Analytics
Hadoop
Real-time
Cassandra
Real-time queries with
Cassandra
Distributed Search with
Solr
Analytics with Hadoop
MapReduce
Cassandra Cluster
(Replication)
Summary
Apache Cassandra is a elastic scalable, fault-
tolerant data store
Tunable consistency levels
Wide Column: flexible datamodel without schema
Supports: real-time queries, analytics through
Hadoop integration, Solr-based fulltext search
Thank you!
Q&A