Post on 22-Dec-2014
description
transcript
Brisk: Truly peer-to-peer Hadoop High-order bits from Cassandra & Hadoop
srisatish ambati@srisatish
How many in audience…
NoSQL -Know your queries.
points
• Usecases• Why cassandra?• Usecase: Hadoop, Brisk• FUD: Consistency
– Why facebook is not using Cassandra?• Anti-patterns• Community, Code, Tools• Q&A
Users. Netflix.Key by Customer, read-heavyKey by Customer:Movie, write-heavy
TimeSeries: (several customers)periodic readings: dev0, dev1…deviceID:metric:timestamp ->value
Metrics typically way larger dataset than users.
Why Cassandra?
Operational simplicity peer-to-peer
Operational simplicity peer-to-peer
write
read
Replication: Multi-datacenterMulti-region ec2Multi-availability zones
Replication: Multi-datacenterMulti-region ec2, awsMulti-availability zones
dc1 dc2
reads local
“Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled
4.21.2011, Amazon Web Services outage:
Netflix was running on AWS.
4.21.2011, Amazon Web Services outage:
fast durable writes. fast reads.
Writes Sequential, append-only.~1-5ms
Writes Sequential, append-only.~1-5ms
On cloud: ephemeral disks rock!
Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized
Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized
ssds: improved read performance!
amortize Replication over writes Repair over reads
Distribution between nodes Gossip Anti-entropy Failure-detector
L i g h t w e i g h t
Clients: cql, thrift pycassa, phpcassa hector, pelops (scala, ruby, clojure)
Usecase #3: h a d o o pHdfs cassandra hiveLogs stats analytics
BriskTruly peer-to-peer hadoop.
mv computationnot data
word count in MapReduce
map(String key, String value): // key: document name
// value: document contents for each word w in value: EmitIntermediate(w, "1");
reduce(String key, Iterator values): // key: a word
// values: a list of counts int result = 0; for each v in values: result += ParseInt(v);
Emit(AsString(result));
Parallel Execution View
immutable datawrite-once-read-many!Files once created, written & closed..
not changing!
jobtracker, tasktrackerhdfs: namenode, datanode
clouderaamazon: elastic map reducehortonworksmapRbrisk
Tools & Analytics Hive, Pig, RKarmasphereDatameer… dozens of stealth startups!
“However, given that there is only a single master, it’s failure is unlikely;”The MapReduce paper, 2004. Sanjay et,al, Google.
Namenode decomposition, explained.
NameNode: Single Master nodeSingle Machine Address spaceSingle Point of failure
Use column families (tables)inodesblock
One kind of nodeno master node, no spofpeer-to-peer
near-real time hadoopLow latency: cassandra_dc nodesBatch Analytics: brisk_dc nodes
BriskSimpleSnitch.java
if(TrackerInitializer.isTrackerNode) { myDC = BRISK_DC; logger.info("Detected Hadoop trackers are enabled, setting my DC to " + myDC); } else { myDC = CASSANDRA_DC;
logger.info("Looks like Vanilla Cassandra nodes, setting my DC to " + myDC); }
Hive: SQL-like accesscli, hwi, jdbc, metastorePushdown predicates (v beta2)
hive> CREATE TABLE invites (foo INT, bar STRING)PARTITIONED BY (ds STRING);
hive> LOAD DATA LOCAL INPATH '$BRISK_HOME/resources/hive/examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2008-08-15');
hive> SELECT count(*), ds FROM invites GROUP BY ds;
http://www.datastax.com/docs/0.8/brisk/about_hive
ETLReal-time
Cassandra CFsDataCenters
Scale
@srisatish
@srisatish
No me in team!
Ben Coverston Ben Werther Brandon Williams Cathy Daw Jackson Chung Jake Luciani Joaquin Casares Jonathan Ellis
Michael Allen Mike Bulman Nate McCall Nick M Bailey Patricio Echague Tyler Hobbs SriSatish Ambati Yewei Zhang
@srisatish100-node Brisk Cluster on Opscenter
FUD, acronym: fear, uncertainty, doubt.
Consistency: R + W > N ORACLE, 2-node: R=1, W=2, N=2,(T=2)DNS
* N is replication factor. Not to be confused with T=total #of nodes
Tune-able, flexibility.For High Consistency:
read:quorum, write:quorumFor High Availability:
high W, low R.
Consistency: R + W > N ORACLE, 2-node: R=1, W=2, N=2,(T=2)DNS"brisk.consistencylevel.read", "QUORUM";"brisk.consistencylevel.write", "QUORUM";
* N is replication factor. Not to be confused with T=total #of nodes
Inbox Search: 600+cores.120+TB (2008)Went from 100-500m users.
Average NoSQL deployment size: ~6-12 nodes.
Usecase #5: searchApache Solr + Cassandra = Solandra
Other inbox/file Searches:xobni, c3
github.com/tjake/solandra
“Eventual consistency is harder to program.”mostly immutable data.complex systems at scale.
Miscellaneous, Myth: data-loss, partial rows.writes are durable.
Anti-PatternsTransactionsJoinsRead before write
Anti-Patterns for cloudebsjvm, virtualizedsingle region
A few more good reasons for Cassandra...
ToolsAMIs, OpsCenter, DataStaxAppDynamics
Getting Started with brisk ami
Netflix just builds AMIs for deployment!
B e a u t i f u l C 0 d e
= new code(); //less is more~90k.java.concurrent.@annotate. bloomfilters, merkletrees.non-blocking, staged-event-driven.bigtable, dynamo.
Current & Future Focus:Distributed Counters, CQL.Simple client.operational smoothening.
compaction.
CommunityRobust. Rapid. Brisk #Professional support from DataStax.git clone git@github.com:riptano/brisk.git
engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..
Come join the efforts!
Usecase #4: first NoSQL, then scale!simpledb Cassandra mongodb Cassandra
Copyright: xkcd
Copyright: plantoys
… more than one way to do it!
Summary -high scale peer-to-peer datastore
best friend for multi-region, multi-zone availability.
Hadoop – HDFS engulfing the DataWorld
Brisk – best of both worlds!
Q&A@srisatish
OSS, 2008
+
+ +
Brisk
Cassandra
Incubator 2009
Bigtable, 2006Dynamo, 2007
TLP, 2010
NoSQL -Know your queries.