+ All Categories
Home > Documents > Scaling Web Applications With Cassandra Presentation

Scaling Web Applications With Cassandra Presentation

Date post: 24-Sep-2015
Category:
Upload: jagannath-jaggu
View: 13 times
Download: 3 times
Share this document with a friend
Description:
scaling Web Applications With Cassandra Presentation in databases in sever
Popular Tags:
45
introduction to cassandra eben hewitt september 29. 2010 web 2.0 expo new york city
Transcript
  • introduction to cassandraeben hewitt

    september 29. 2010web 2.0 exponew york city

  • director, application architecture at a global corp

    focus on SOA, SaaS, Events

    i wrote this

    @ebenhewitt

  • agendacontextfeaturesdata modelapi

  • nosql big datamongodbcouchdbtokyo cabinetredisriakwhat about?Poet, Lotus, Xindicetheyve been around foreverrdbms was once the new kid

  • innovation at scalegoogle bigtable (2006)consistency model: strongdata model: sparse mapclones: hbase, hypertableamazon dynamo (2007)O(1) dhtconsistency model: client tune-ableclones: riak, voldemort

    cassandra ~= bigtable + dynamo

  • provenThe Facebook stores 150TB of data on 150 nodes

    web 2.0

    used at Twitter, Rackspace, Mahalo, Reddit, Cloudkick, Cisco, Digg, SimpleGeo, Ooyala, OpenX, others

  • cap theoremconsistencyall clients have same view of dataavailabilitywriteable in the face of node failurepartition toleranceprocessing can continue in the face of network failure (crashed router, broken network)

  • daniel abadi: pacelc

  • write consistencyread consistency

    LevelDescriptionZEROGood luck with thatANY1 replica (hints count)ONE1 replica. read repair in bkgndQUORUM (DCQ for RackAware)(N /2) + 1ALLN = replication factor

    LevelDescriptionZEROUmmmANYTry ONE insteadONE1 replicaQUORUM (DCQ for RackAware)Return most recent TS after (N /2) + 1 reportALLN = replication factor

  • agendacontextfeaturesdata modelapi

  • cassandra propertiestuneably consistentvery fast writeshighly availablefault tolerantlinear, elastic scalabilitydecentralized/symmetric~12 client languages Thrift RPC API~automatic provisioning of new nodes0(1) dht big data

  • write op

  • Staged Event-Driven ArchitectureA general-purpose framework for high concurrency & load conditioningDecomposes applications into stages separated by queuesAdopt a structured approach to event-driven concurrency

  • instrumentation

  • data replication

  • partitioner smack-downRandom Preservingsystem will use MD5(key) to distribute data across nodeseven distribution of keys from one CF across ranges/nodes

    Order Preservingkey distribution determined by tokenlexicographical orderingrequired for range queries scan over rows like cursor in indexcan specify the token for this node to usescrabble distribution

  • agendacontextfeaturesdata modelapi

  • structure

  • keyspace~= databasetypically one per applicationsome settings are configurable only per keyspace

  • column familygroup records of similar kindnot same kind, because CFs are sparse tablesex:UserAddressTweetPointOfInterestHotelRoom

  • think of cassandra as row-orientedeach row is uniquely identifiable by keyrows group columns and super columns

  • column familyn= 42user=ebenkey123key456user=alisonicon=

    nickname=The Situation

  • json-like notationUser {123 : { email: [email protected], icon: },

    456 : { email: [email protected], location: The Danger Zone}}

  • 0.6 example$cassandra f$bin/cassandra-cli cassandra> connect localhost/9160

    cassandra> set Keyspace1.Standard1[eben][age]=29cassandra> set Keyspace1.Standard1[eben][email][email protected]> get Keyspace1.Standard1[eben'][age']=> (column=6e616d65, value=39, timestamp=1282170655390000)

  • a column has 3 partsnamebyte[]determines sort orderused in queriesindexedvaluebyte[]you dont query on column valuestimestamplong (clock)last write wins conflict resolution

  • column comparatorsbyteutf8longtimeuuidlexicaluuid

    ex: lat/long

  • super columnsuper columns group columns under a common name

  • PointOfInterestsuper column familyCentral Park10017

    Empire State Bldg

    Phoenix Zoo85255desc=Fun to walk in.phone=212. 555.11212desc=Great view from 102nd floor!

  • PointOfInterest { key: 85255 { Phoenix Zoo { phone: 480-555-5555, desc: They have animals here. }, Spring Training { phone: 623-333-3333, desc: Fun for baseball fans. }, }, //end phx

    key: 10019 { Central Park { desc: Walk around. It's pretty.} , Empire State Building { phone: 212-777-7777, desc: Great view from 102nd floor. } } //end nyc}ssuper columnsuper column familyflexible schemakeycolumn super column family

  • about super column familiessub-column names in a SCF are not indexedtop level columns (SCF Name) are always indexedoften used for denormalizing data from standard CFs

  • agendacontextfeaturesdata modelapi

  • slice predicatedata structure describing columns to returnSliceRangestart column namefinish column name (can be empty to stop on count)reversecount (like LIMIT)

  • read apiget() : Columnget the Col or SC at given ColPath COSC cosc = client.get(key, path, CL);

    get_slice() : Listget Cols in one row, specified by SlicePredicate: List results = client.get_slice(key, parent, predicate, CL);

    multiget_slice() : Mapget slices for list of keys, based on SlicePredicate Map results = client.multiget_slice(rowKeys, parent, predicate, CL);

    get_range_slices() : List returns multiple Cols according to a rangerange is startkey, endkey, starttoken, endtoken: List slices = client.get_range_slices( parent, predicate, keyRange, CL);

  • write apiclient.insert(userKeyBytes, parent, new Column(band".getBytes(UTF8), Funkadelic".getBytes(), clock), CL);

    batch_mutatevoidbatch_mutate( map, CL)removevoidremove(byte[], ColumnPathcolumn_path,Clock,CL)

  • batch_mutate//create paramMap mutationMap = new HashMap();

    //create Cols for MutsColumn nameCol = new Column("name".getBytes(UTF8),Funkadelic.getBytes("UTF-8"), new Clock(System.nanoTime()););Mutation nameMut = new Mutation();nameMut.column_or_supercolumn = nameCosc; //also phone, etc

    Map muts = new HashMap();List cols = new ArrayList();cols.add(nameMut);cols.add(phoneMut);muts.put(CF, cols);//outer map key is a row key; inner map key is the CF namemutationMap.put(rowKey.getBytes(), muts);//send to serverclient.batch_mutate(mutationMap, CL);

  • raw thrift: for masochists only

    pycassa (python)fauna (ruby)hector (java)pelops (java)kundera (JPA)hectorSharp (C#)

  • what aboutSELECT WHEREORDER BYJOIN ON GROUP

    ?

  • rdbms: domain-based model what answers do I have?

    cassandra: query-based model what questions do I have?

  • SELECT WHEREcassandra is an index factory

    USERKey: UserIDCols: username, email, birth date, city, stateHow to support this query?

    SELECT * FROM User WHERE city = Scottsdale

    Create a new CF called UserCity:USERCITYKey: cityCols: IDs of the users in that city.Also uses the Valueless Column pattern

  • Use an aggregate key state:city: { user1, user2}

    Get rows between AZ: & AZ; for all Arizona users

    Get rows between AZ:Scottsdale & AZ:Scottsdale1 for all Scottsdale usersSELECT WHERE pt 2

  • ORDER BYRows are placed according to their Partitioner:

    Random: MD5 of keyOrder-Preserving: actual key

    are sorted by key, regardless of partitionerColumns are sorted according to CompareWith or CompareSubcolumnsWith

  • is cassandra a good fit?you need really fast writesyou need durabilityyou have lots of data > GBs>= three serversyour app is evolvingstartup mode, fluid data structureloose domain data points of interest

    your programmers can dealdocumentationcomplexityconsistency modelchangevisibility toolsyour operations can dealhardware considerationscan move dataJMX monitoring

  • thank you!@ebenhewitt

    **************


Recommended