Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

transcript

Brisk: Truly peer-to-peer Hadoop High-order bits from Cassandra & Hadoop

srisatish ambati@srisatish

How many in audience…

NoSQL -Know your queries.

points

• Usecases• Why cassandra?• Usecase: Hadoop, Brisk• FUD: Consistency

– Why facebook is not using Cassandra?• Anti-patterns• Community, Code, Tools• Q&A

Users. Netflix.Key by Customer, read-heavyKey by Customer:Movie, write-heavy

TimeSeries: (several customers)periodic readings: dev0, dev1…deviceID:metric:timestamp ->value

Metrics typically way larger dataset than users.

Why Cassandra?

Operational simplicity peer-to-peer

Replication: Multi-datacenterMulti-region ec2Multi-availability zones

Replication: Multi-datacenterMulti-region ec2, awsMulti-availability zones

dc1 dc2

reads local

“Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled

4.21.2011, Amazon Web Services outage:

Netflix was running on AWS.

4.21.2011, Amazon Web Services outage:

fast durable writes. fast reads.

Writes Sequential, append-only.~1-5ms

On cloud: ephemeral disks rock!

Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized

ssds: improved read performance!

amortize Replication over writes Repair over reads

Distribution between nodes Gossip Anti-entropy Failure-detector

L i g h t w e i g h t

Clients: cql, thrift pycassa, phpcassa hector, pelops (scala, ruby, clojure)

Usecase #3: h a d o o pHdfs cassandra hiveLogs stats analytics

BriskTruly peer-to-peer hadoop.

mv computationnot data

word count in MapReduce

map(String key, String value): // key: document name

// value: document contents for each word w in value: EmitIntermediate(w, "1");

reduce(String key, Iterator values): // key: a word

// values: a list of counts int result = 0; for each v in values: result += ParseInt(v);

Emit(AsString(result));

Parallel Execution View

immutable datawrite-once-read-many!Files once created, written & closed..

not changing!

jobtracker, tasktrackerhdfs: namenode, datanode

clouderaamazon: elastic map reducehortonworksmapRbrisk

Tools & Analytics Hive, Pig, RKarmasphereDatameer… dozens of stealth startups!

“However, given that there is only a single master, it’s failure is unlikely;”The MapReduce paper, 2004. Sanjay et,al, Google.

Namenode decomposition, explained.

NameNode: Single Master nodeSingle Machine Address spaceSingle Point of failure

Use column families (tables)inodesblock

One kind of nodeno master node, no spofpeer-to-peer

near-real time hadoopLow latency: cassandra_dc nodesBatch Analytics: brisk_dc nodes

BriskSimpleSnitch.java

if(TrackerInitializer.isTrackerNode) { myDC = BRISK_DC; logger.info("Detected Hadoop trackers are enabled, setting my DC to " + myDC); } else { myDC = CASSANDRA_DC;

logger.info("Looks like Vanilla Cassandra nodes, setting my DC to " + myDC); }

Hive: SQL-like accesscli, hwi, jdbc, metastorePushdown predicates (v beta2)

hive> CREATE TABLE invites (foo INT, bar STRING)PARTITIONED BY (ds STRING);

hive> LOAD DATA LOCAL INPATH '$BRISK_HOME/resources/hive/examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');

hive> SELECT count(*), ds FROM invites GROUP BY ds;

http://www.datastax.com/docs/0.8/brisk/about_hive

ETLReal-time

Cassandra CFsDataCenters

@srisatish

No me in team!

Ben Coverston Ben Werther Brandon Williams Cathy Daw Jackson Chung Jake Luciani Joaquin Casares Jonathan Ellis

Michael Allen Mike Bulman Nate McCall Nick M Bailey Patricio Echague Tyler Hobbs SriSatish Ambati Yewei Zhang

@srisatish100-node Brisk Cluster on Opscenter

FUD, acronym: fear, uncertainty, doubt.

Consistency: R + W > N ORACLE, 2-node: R=1, W=2, N=2,(T=2)DNS

* N is replication factor. Not to be confused with T=total #of nodes

Tune-able, flexibility.For High Consistency:

read:quorum, write:quorumFor High Availability:

high W, low R.

Consistency: R + W > N ORACLE, 2-node: R=1, W=2, N=2,(T=2)DNS"brisk.consistencylevel.read", "QUORUM";"brisk.consistencylevel.write", "QUORUM";

* N is replication factor. Not to be confused with T=total #of nodes

Inbox Search: 600+cores.120+TB (2008)Went from 100-500m users.

Average NoSQL deployment size: ~6-12 nodes.

Usecase #5: searchApache Solr + Cassandra = Solandra

Other inbox/file Searches:xobni, c3

github.com/tjake/solandra

“Eventual consistency is harder to program.”mostly immutable data.complex systems at scale.

Miscellaneous, Myth: data-loss, partial rows.writes are durable.

Anti-PatternsTransactionsJoinsRead before write

Anti-Patterns for cloudebsjvm, virtualizedsingle region

A few more good reasons for Cassandra...

ToolsAMIs, OpsCenter, DataStaxAppDynamics

Getting Started with brisk ami

Netflix just builds AMIs for deployment!

B e a u t i f u l C 0 d e

= new code(); //less is more~90k.java.concurrent.@annotate. bloomfilters, merkletrees.non-blocking, staged-event-driven.bigtable, dynamo.

Current & Future Focus:Distributed Counters, CQL.Simple client.operational smoothening.

compaction.

CommunityRobust. Rapid. Brisk #Professional support from DataStax.git clone git@github.com:riptano/brisk.git

engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..

Come join the efforts!

Usecase #4: first NoSQL, then scale!simpledb Cassandra mongodb Cassandra

Copyright: xkcd

Copyright: plantoys

… more than one way to do it!

Summary -high scale peer-to-peer datastore

best friend for multi-region, multi-zone availability.

Hadoop – HDFS engulfing the DataWorld

Brisk – best of both worlds!

Q&A@srisatish

OSS, 2008

Cassandra

Incubator 2009

Bigtable, 2006Dynamo, 2007

TLP, 2010

NoSQL -Know your queries.

Java one2011 brisk-and_high_order_bits_from_cassandra_and_hadoop

Technology