Apache Cassandra
Clients and Transports
Thursday, February 28, 13
Hi Folks!I’m Nate @zznate
Thursday, February 28, 13
API ManagementAPI AnalyticsAPI Tools
Thursday, February 28, 13
Clients and transportsfor Cassandra
Thursday, February 28, 13
But first... some questions
Thursday, February 28, 13
But first: Architectural stuff
Thursday, February 28, 13
Cassandra:“Sparsely Columnar”
Thursday, February 28, 13
An RDBMS table
Thursday, February 28, 13
An RDBMS table
Thursday, February 28, 13
Cassandra Style
Thursday, February 28, 13
Cassandra Style
Thursday, February 28, 13
Cassandra data modelling
Thursday, February 28, 13
Four common patternsSimple object to simple rowSparse object to rowsMaterialized viewManual index
Thursday, February 28, 13
simple objects to simple row
Thursday, February 28, 13
“static” column family
Thursday, February 28, 13
Sparse Objects
Thursday, February 28, 13
“dynamic” column family
Thursday, February 28, 13
Materialized Views
Thursday, February 28, 13
Materialized view
Thursday, February 28, 13
Regardless of the approached used, there are four overall goals
Thursday, February 28, 13
1. Denormalize2. Eliminate seeks3. Design for read4. Optimiza for blind writes
Thursday, February 28, 13
Now... let’s talk about protocols
Thursday, February 28, 13
Thrift
Thursday, February 28, 13
ThriftRPC-Based
Thursday, February 28, 13
ThriftRPC-BasedMature Apache Project
Thursday, February 28, 13
ThriftRPC-BasedMature Apache ProjectSupports lots of languages
Thursday, February 28, 13
ThriftRPC-BasedMature Apache ProjectSupports lots of languagesExtensible!
Thursday, February 28, 13
CQL
Thursday, February 28, 13
CQLWell defined protocol
Thursday, February 28, 13
CQLWell defined protocolSupports Compression
Thursday, February 28, 13
CQLWell defined protocolSupports CompressionNetty/NIO-based
Thursday, February 28, 13
Storage Mechanics(but quickly)
Thursday, February 28, 13
get_slice
Workhorse of Cassandra selection methods
Thursday, February 28, 13
get_slice: key
The row key
Thursday, February 28, 13
get_slice: ColumnParent
The column family (a.k.a table)
Thursday, February 28, 13
get_slice: SlicePredicate
defines the column range, or specifically named columns
Thursday, February 28, 13
get_slice:ConsistencyLevel
The level of consistency we want for this read
Thursday, February 28, 13
Obtuse at first glance, but nothing is hidden
Thursday, February 28, 13
So...
Thursday, February 28, 13
But one person’s abstraction leakage is another’s preffered model
Thursday, February 28, 13
How closely do you want to interact with the underlying storage engine?
Thursday, February 28, 13
Client APIs
Thursday, February 28, 13
Benefits of thrift
Thursday, February 28, 13
Benefits of thriftMature selection of clients
Thursday, February 28, 13
Benefits of thriftMature selection of clientsMultiple languages
Thursday, February 28, 13
Benefits of thriftMature selection of clientsMultiple languagesWell documented
Thursday, February 28, 13
Benefits of thriftMature selection of clientsMultiple languagesWell documentedCan be used in other places
Thursday, February 28, 13
Drawbacks of thrift
Thursday, February 28, 13
Drawbacks of thriftSeveral objects are required for any request
Thursday, February 28, 13
Drawbacks of thriftSeveral objects are required for any requestClients differs in implementation
Thursday, February 28, 13
Drawbacks of thriftSeveral objects are required for any requestClients differs in implementationUpstream dependency issues
Thursday, February 28, 13
Drawbacks of thriftSeveral objects are required for any requestClients differs in implementationUpstream dependency issuesSchema changes and cluster health done pro-actively
Thursday, February 28, 13
Benefits of cql api
Thursday, February 28, 13
Benefits of cql apiStored procedures
Thursday, February 28, 13
Benefits of cql apiStored proceduresCommon operations are straight forward
Thursday, February 28, 13
Benefits of cql apiStored proceduresCommon operations are straight forward Cluster health and schema change push-back
Thursday, February 28, 13
Benefits of cql apiStored proceduresCommon operations are straight forward Cluster health and schema change push-backAwesome client available
Thursday, February 28, 13
Drawbacks of CQL apis
Thursday, February 28, 13
Drawbacks of CQL apisStill have idiomatic clients
Thursday, February 28, 13
Drawbacks of CQL apisStill have idiomatic clientsStill a binary protocol
Thursday, February 28, 13
Drawbacks of CQL apisStill have idiomatic clientsStill a binary protocolDefault storage model emposes substantial restrictions** see gotchas section later
Thursday, February 28, 13
Considerations for your app
Thursday, February 28, 13
Stick with Thrift if...
Thursday, February 28, 13
Heavy update workloads
Thursday, February 28, 13
Large, dynamic batch insertions
Thursday, February 28, 13
Hadoop integration(CASSANDRA-4421)
Thursday, February 28, 13
Commonly deal with very wide rows(CASSANDRA-4176)
Thursday, February 28, 13
CASSANDRA-4176:“Pick your shard keys carefully”
Thursday, February 28, 13
Thursday, February 28, 13
Consider CQL if...
Thursday, February 28, 13
Static column family model:Take advantage of stored procedures for common reads
Thursday, February 28, 13
Despite the shard key jab, CQL makes good use of the storage model
Thursday, February 28, 13
You can replace some custom serialization with collections
Thursday, February 28, 13
Integration with JDBC and/or BI tools
Thursday, February 28, 13
Wire efficient:Does not return timestamp or TTL by default
Thursday, February 28, 13
Larger, potentially more transient evironments
Thursday, February 28, 13
But CQL is - new- an abstraction
Thursday, February 28, 13
In some cases, CQL might not do what you think
Thursday, February 28, 13
Most common CQL pitfalls
Thursday, February 28, 13
Collections can only be retrieved in their entirety
Thursday, February 28, 13
Can’t mix static and dynamic data in a column family
Thursday, February 28, 13
“keys only” range slices don’t work(CASSANDRA-4536)
Thursday, February 28, 13
Range ghosts will not be returned
Thursday, February 28, 13
Batch inserts are clunky(CASSANDRA-4693)
Thursday, February 28, 13
With non-compact storage the whole row must be read every time.
Thursday, February 28, 13
The take away is that you have options. Particularly good ones for Java.
Thursday, February 28, 13
Thursday, February 28, 13
BUT
Thursday, February 28, 13
there is a larger, more fundamental problem to discuss
Thursday, February 28, 13
“If [they] think that CQL is the answer to usability then I just won. We at least know where our problems are.”- 10gen exec.
Thursday, February 28, 13
The market has spoken and we missed the boat.
Thursday, February 28, 13
POST /endpoint {json}
Thursday, February 28, 13
A Cassandra-MVP actually maintains a REST front-end
Thursday, February 28, 13
So we’ve taken this and gone further
Thursday, February 28, 13
What if...
Thursday, February 28, 13
Coming soon...Intravert. Vert.x+Cassandra.ASF-licensed.Driven by real-world requirements.
Thursday, February 28, 13