Date post: | 11-Apr-2017 |
Category: |
Technology |
Upload: | datastax-academy |
View: | 530 times |
Download: | 0 times |
CASSANDRA DAY DALLAS 2015
SOFTWARE DEVELOPMENT WITH CASSANDRA:A WALKTHROUGH
Nate McCall@zznate
Co-Founder & Sr. Technical Consultanthttp://www.slideshare.net/zznate/soft-dev-withcassandraawalkthrough
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
About The Last Pickle.
Work with clients to deliver and improve Apache Cassandra based solutions.
Based in New Zealand, Australia & USA.
OVERVIEWDATA MODELINGWRITING CODE
TESTING REVIEWING
MANAGING ENVIRONMENTS
Overview:
What makes a software development
project successful?
Overview: Successful Software Development
- it ships- maintainable- good test coverage- check out and build
Overview:
Impedance mismatch:distributed systems
developmenton a laptop.
OVERVIEWDATA MODELINGWRITING CODE
TESTING REVIEWING
MANAGING ENVIRONMENTS
Data Modeling:
… a topic unto itself.But quickly:
Data Modeling - Quickly
• It’s Hard• Do research• #1 performance problem• Don’t “port” your schema!
Data Modeling - Using CQL:
• tools support• easy tracing (and trace discovery)• documentation*
*Maintained in-tree:https://github.com/apache/cassandra/blob/cassandra-1.2/doc/cql3/CQL.textilehttps://github.com/apache/cassandra/blob/cassandra-2.0/doc/cql3/CQL.textilehttps://github.com/apache/cassandra/blob/cassandra-2.1/doc/cql3/CQL.textile
Data Modeling - DevCenter :
Tools:DataStax DevCenter
http://www.datastax.com/what-we-offer/products-services/devcenter
OVERVIEWDATA MODELINGWRITING CODE
TESTING REVIEWING
MANAGING ENVIRONMENTS
Writing Code:
use CQL
Writing Code - Java Driver :
Use the Java Driver
• Reference implementation• Well written, extensive coverage• Open source• Dedicated development resourceshttps://github.com/datastax/java-driver/
Writing Code - Java Driver :
Existing Spring Users:Spring Data Integration
http://projects.spring.io/spring-data-cassandra/
Writing Code - Java Driver :
Four rules for Writing Code• one Cluster for physical cluster• one Session per app per keyspace• use PreparedStatements • use Batches to reduce network IO
Writing Code - Java Driver :
Configuration is Similar to Other DB Drivers(with caveats**)
http://www.datastax.com/documentation/developer/java-driver/2.1/common/drivers/reference/clusterConfiguration_c.html
Writing Cluster - Java Driver - Configuration:
Major Difference:it’s a Cluster!
Writing Code - Java Driver - Configuration:
Two groups of configurations
• policies• connections
Writing Code - Java Driver - Configuration:
Three Policy Types:• load balancing• connection• retry
Writing Code - Java Driver - Configuration:
Connection Options:• protocol*• pooling**• socket
*https://github.com/apache/cassandra/blob/cassandra-2.1/doc/native_protocol_v3.spec**https://github.com/datastax/java-driver/tree/2.1/features/pooling
Writing Code - Java Driver - Configuration:
Code sample for building a Cluster
https://github.com/datastax/java-driver/tree/2.1/features/compression
https://github.com/datastax/java-driver/tree/2.1/features/logging
Writing Code - Java Driver - Pagination:
Simple result iteration
CREATE TABLE IF NOT EXISTS transit.vehicle_data ( vehicle_id text, speed double, time timeuuid, PRIMARY KEY ((customer_id), time) );
Writing Code - Java Driver - Pagination:
Simple result iteration:Java 8 style
Writing Code - Java Driver - Async
Async!(not so) Simple result iteration
Writing Code - Java Driver - Pagination:
Not much to it:
PreparedStatement prepStmt = session.prepare(CQL_STRING);BoundStatement boundStmt = new BoundStatement(prepStmt);
boundStatement.setFetchSize(100)
https://github.com/datastax/java-driver/tree/2.1/features/paging
Writing Code - Java Driver - Inserts and Updates:
About Inserts (and updates)
Writing Code - Java Driver - Inserts and Updates:
Batches: three types- logged- unlogged- counter
Writing Code - Java Driver - Inserts and Updates:
unlogged batch
Writing Code - Java Driver - Inserts and Updates:
LWT:INSERT INTO vehicle (vehicle_id, make, model, vin)VALUES ('VHE-101', 'Toyota','Tercel','1234f') IF NOT EXISTS;
Writing Code - Java Driver - Inserts and Updates:
LWT:UPDATE vehicleSET vin = '123fa'WHERE vehichle_id = 'VHE-101'IF vin = '1234f';
Writing Code:
ORM?Great for basic
CRUD operations
http://www.datastax.com/documentation/developer/java-driver/2.1/java-driver/reference/crudOperations.html
https://github.com/datastax/java-driver/blob/2.1/driver-mapping/src/test/java/com/datastax/driver/mapping/MapperTest.java
https://github.com/datastax/java-driver/blob/2.1/driver-mapping/src/test/java/com/datastax/driver/mapping/MapperTest.java
https://github.com/datastax/java-driver/blob/2.1/driver-mapping/src/test/java/com/datastax/driver/mapping/MapperTest.java
https://github.com/datastax/java-driver/blob/2.1/driver-mapping/src/test/java/com/datastax/driver/mapping/MapperTest.java
Writing Code - Java Driver :
A note about User Defined Types (UTDs)
Writing Code - Java Driver - Using UDTs:
Wait.- serialized as blobs !!?!- new version already being discussed*- will be a painful migration path
* https://issues.apache.org/jira/browse/CASSANDRA-7423
OVERVIEWDATA MODELINGWRITING CODE
TESTING REVIEWING
MANAGING ENVIRONMENTS
Testing:
Use a Naming Scheme
• *UnitTest.java: no external resources• *ITest.java: uses external resources• *PITest.java: safely parallel “ITest”
Testing:
Tip: wildcards on the CLI
are not a naming schema.
Testing:
Group tests into
logical units (“suites”)
Testing - Suites:
Benefits of Suites:• share test data• share Cassandra instance(s)• build profiles
<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>
<profile> <id>short</id> <properties> <env>default</env> </properties> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.16</version> <configuration> <groups>unit,short</groups> <useFile>false</useFile> <systemPropertyVariables> <cassandra.version>${cassandra.version}</cassandra.version> <ipprefix>${ipprefix}</ipprefix> </systemPropertyVariables> </configuration> </plugin> </plugins> </build> </profile>
Testing - Suites:
Using annotations for suites in code
Testing - Suites:
Interesting test plumbing• [Before|Afer]Suite• [Before|After]Group• Listeners
Testing:
Use Mocks where possible
Testing:
scassandra:not quite integration
http://www.scassandra.org/
Testing:
Unit Integration Testing
Testing:
Verify Assumptions:test failure scenarios
explicitly
Testing - Integration:
Runtime Integrations:• local • in-process• forked-process
Testing - Integration - Runtime:
EmbeddedCassandra
https://github.com/jsevellec/cassandra-unit/
Testing - Integration - Runtime:
ProcessBuilder to fork Cassandra(s)
Testing - Integration - Runtime:
CCMBridge:delegate to CCM
https://github.com/datastax/java-driver/blob/2.1/driver-core/src/test/java/com/datastax/driver/core/CCMBridge.java
Testing - Integration:
Best Practice:Jenkins should be able to
manage your cluster
Testing:
Load Testing Goals• reproducible metrics• catch regressions• test to breakage point
Testing - Load Testing:
Stress.java(lot’s of changes recently)
https://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsCStress_t.htmlhttp://www.datastax.com/dev/blog/improved-cassandra-2-1-stress-tool-benchmark-any-schema
Testing - Load Testing:
Workload recording and playback coming soon
one day
https://issues.apache.org/jira/browse/CASSANDRA-8929
Testing:
Primary testing goal:Don’t let
cluster behavior surprise you.
OVERVIEWDATA MODELINGWRITING CODE
TESTING REVIEWING
MANAGING ENVIRONMENTS
Writing Code:
Metrics API for your own code
https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/metrics/ColumnFamilyMetrics.javahttps://dropwizard.github.io/metrics/3.1.0/
Writing Code - Instrumentation via Metrics API:
Run Riemann locally
http://riemann.io/
Reviewing Said Code:
Using Trace (and doing so frequently)
Writing Code - Tracing:
Trace per query via DevCenter
http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html
Writing Code - Tracing:
Trace per query via cqlsh
http://www.datastax.com/documentation/cql/3.0/cql/cql_reference/tracing_r.html
Writing Code - Tracing:
Trace per query via Java Driver
http://docs.datastax.com/en/drivers/java/2.0/com/datastax/driver/core/Statement.html#enableTracing()
cqlsh> tracing on;Now tracing requests.cqlsh> SELECT doc_version FROM data.documents_by_version ... WHERE application_id = myapp ... AND document_id = foo ... AND chunk_index = 0 ... ORDER BY doc_version ASC ... LIMIT 1;
doc_version------------- 65856
Tracing session: 46211ab0-2702-11e4-9bcf-8d157d448e6b
Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817…
Preparing statement | 18:05:44,845 | 192.168.1.197 | 22337 Enqueuing data request to /192.168.1.204 | 18:05:44,845 | 192.168.1.197 | 22504 Sending message to /192.168.1.204 | 18:05:44,847 | 192.168.1.197 | 24498 Message received from /192.168.1.197 | 18:05:44,854 | 192.168.1.204 | 872 Executing single-partition query on documents_by_version | 18:05:44,888 | 192.168.1.204 | 35183 Acquiring sstable references | 18:05:44,888 | 192.168.1.204 | 35459 Merging memtable tombstones | 18:05:44,889 | 192.168.1.204 | 35675 Key cache hit for sstable 2867 | 18:05:44,889 | 192.168.1.204 | 35792 Seeking to partition beginning in data file | 18:05:44,889 | 192.168.1.204 | 35817…
… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592
… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592
!!?!
… Merging data from memtables and 8 sstables | 18:05:44,892 | 192.168.1.204 | 38605 Read 1 live and 2667 tombstoned cells | 18:05:54,135 | 192.168.1.204 | 9282428 Enqueuing response to /192.168.1.197 | 18:05:54,136 | 192.168.1.204 | 9283423 Sending message to /192.168.1.197 | 18:05:54,138 | 192.168.1.204 | 9284753 Message received from /192.168.1.204 | 18:05:54,155 | 192.168.1.197 | 9332505 Processing response from /192.168.1.204 | 18:05:54,158 | 192.168.1.197 | 9335372 Request complete | 18:05:54,158 | 192.168.1.197 | 9335592
Writing Code - Tracing:
Enable traces in the driver
http://www.datastax.com/documentation/developer/java-driver/2.0/java-driver/tracing_t.html
Writing Code - Tracing:
`nodetool settraceprobability`
Writing Code - Tracing:
…then make sure you try it again
with a node down!
Writing Code - Tracing:
Final note on tracing:do it sparingly
Writing Code - Tracing:
Enable query latency logging
https://github.com/datastax/java-driver/tree/2.1/features/logging
Writing Code:
Logging Verbositycan be changed dynamically**
** since 0.4rc1
http://www.datastax.com/documentation/cassandra/2.0/cassandra/configuration/configLoggingLevels_r.html
Writing Code:
nodetool for developers• cfstats• cfshistograms• proxyhistograms
Writing Code - nodetool - cfstats:
cfstats:per-table statistics about size
and performance (single most useful command)
Writing Code - nodetool - cfhistograms:
cfhistograms:column count and partition size vs. latency distribution
Writing Code - nodetool - proxyhistograms:
proxyhistograms:performance of inter-cluster
requests
OVERVIEWDATA MODELINGWRITING CODE
TESTING REVIEWING
MANAGING ENVIRONMENTS
Managing Environments:
Configuration Management is Essential
Managing Environments:
Laptop to Productionwith NO
Manual Modifications!
Managing Environments:
Running Cassandraduring development
Managing Environments - Running Cassandra:
Local Cassandra• easy to setup• you control it • but then you control it!
Managing Environments - Running Cassandra:
CCM• supports multiple versions• clusters and datacenters• up/down individual nodeshttps://github.com/pcmanus/ccm
Managing Environments - Running Cassandra:
Docker:• Official image available with excellent docs*• Docker Compose for more granular control**
*https://hub.docker.com/_/cassandra/**https://docs.docker.com/compose/
Managing Environments - Running Cassandra:
Vagrant• isolated, controlled environment• configuration mgmt integration• same CM for production!
http://www.vagrantup.com/
server_count = 3network = '192.168.2.'first_ip = 10
servers = []seeds = []cassandra_tokens = [](0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63}end
server_count = 3network = '192.168.2.'first_ip = 10
servers = []seeds = []cassandra_tokens = [](0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63}end
server_count = 3network = '192.168.2.'first_ip = 10
servers = []seeds = []cassandra_tokens = [](0..server_count-1).each do |i| name = 'node' + (i + 1).to_s ip = network + (first_ip + i).to_s seeds << ip servers << {'name' => name, 'ip' => ip, 'initial_token' => (2**64 / server_count * i) - 2**63}end
chef.json = { :cassandra => {'cluster_name' => 'VerifyCluster', 'version' => '2.0.8', 'setup_jna' => false, 'max_heap_size' => '512M', 'heap_new_size' => '100M', 'initial_token' => server['initial_token'], 'seeds' => "192.168.2.10", 'listen_address' => server['ip'], 'broadcast_address' => server['ip'], 'rpc_address' => server['ip'], 'conconcurrent_reads' => "2", 'concurrent_writes' => "2", 'memtable_flush_queue_size' => "2", 'compaction_throughput_mb_per_sec' => "8", 'key_cache_size_in_mb' => "4", 'key_cache_save_period' => "0", 'native_transport_min_threads' => "2", 'native_transport_max_threads' => "4" }, }
Managing Environments - Running Cassandra:
Mesos?Compelling features, but not quite there
(though it won't be long)
http://mesosphere.github.io/cassandra-mesos/docs/http://www.datastax.com/2015/08/a-match-made-in-heaven-cassandra-and-mesos
Summary:• Cluster-level defaults, override in queries • Follow existing patterns (it's not that different)• Segment your tests and use build profiles• Monitor and Instrument• Use reference implementation drivers• Control your environments• Verify any assumptions about failures
Thanks.
Nate McCall@zznate
Co-Founder & Sr. Technical Consultantwww.thelastpickle.com
#CassandraDays