Svccg nosql 2011_sri-cassandra

Post on 28-Nov-2014

2,731 views 0 download

description

silicon valley cloud computing group, netflix, cassandra talk

transcript

HowStuffWorks version

Cassandra

SriSatish Ambatiengineer, DataStax

@srisatish

Bigtable, 2006Dynamo, 2007

OSS, 2008

Incubator, 2009 TLP, 2010

Digital Reasoning: NLP + entity analytics

OpenWave: enterprise messaging

OpenX: largest publisher-side ad network in the world

Cloudkick: performance data & aggregation

SimpleGEO: location-as-API

Ooyala: video analytics and business intelligence

ngmoco: massively multiplayer game worlds

Cassandra in production

•furiously fast writes

• Append only writes • Sequential disk access• No locks in critical path• Key based atomicity

•client issues •write

•n1

•partitioner

•commit log

•apply to memory

•n2

•find node

•n3

Tuneable reads

Read Internals

@r39132 - #netflixcloud 6

•A feather in the CAP• Eventual Consistency• R + W > No N is RFo T is total nodes

• ex: rdbms with backup• R=1, W=2, N=2, T=2

Read Performance• R=1, 100s of nodes • R=1, W=N (consistency)

Write Performance • W=1, R=N• Quorum (fast writes!)

Client Marshal Arts

Roll your own, C

Thrift

pycassa, phpcassa

Ruby, Scala

Ready made, Java: Hector, Pelops

Common Patterns of Doom: Death by a million gets Turn off Nagle Manage your connections

Adding Nodes New nodes

Add themselves to busiest node And then Split its Range

Busy Node starts transmit to new node Bootstrap logic initiated from any node, cli, web

Cassandra on EC2 cloud

Cassandra on EC2 cloud

*Corey Hulen, EC2

inter-node comm Gossip Protocol

It’s exponential (epidemic algorithm)

Failure Detector Accrual rate phi

Anti-Entropy Bringing replicas to uptodate

UDP for control messages TCP for request routing

CompactionsK1 < Serialized data >K2 < Serialized data >K3 < Serialized data >------

Sorted

K2 < Serialized data >K10 < Serialized data >K30 < Serialized data >------

Sorted

K4 < Serialized data >K5 < Serialized data >K10 < Serialized data >------

Sorted

MERGE SORT

Loaded in memory

K1 < Serialized data >K2 < Serialized data >K3 < Serialized data >K4 < Serialized data >K5 < Serialized data >K10 < Serialized data >K30 < Serialized data >

Sorted

K1 OffsetK5 OffsetK30 OffsetBloom Filter

Index File

Data File

CompactionsK1 < Serialized data >K2 < Serialized data >K3 < Serialized data >------

Sorted

K2 < Serialized data >K10 < Serialized data >K30 < Serialized data >------

Sorted

K4 < Serialized data >K5 < Serialized data >K10 < Serialized data >------

Sorted

MERGE SORT

Loaded in memory

K1 < Serialized data >K2 < Serialized data >K3 < Serialized data >K4 < Serialized data >K5 < Serialized data >K10 < Serialized data >K30 < Serialized data >

Sorted

K1 OffsetK5 OffsetK30 OffsetBloom Filter

Index File

Data File

D E L E T E D

A

LT

W

F

P

YKey “C”

U

Availability in Action

A

LT

W

F

P

YKey “C”

U

Xhint

Availability in Action

JMX

OpsCenter

OpsCenter

OpsCenter