Reporting from the Trenches – How Intuit Uses Cassandra Effectively to Improve Customer Experiences
Rekha Joshi, Staff EngineerIntuit, Inc.
Thank you for joining. We will begin shortly.
Webinar Housekeeping
© 2015 DataStax, All Rights Reserved. 2
All attendees placed on mute
Input questions at any timeusing the online interface
Speaker Bio
© 2015 DataStax, All Rights Reserved. 3
O’Reilly Certified Apache Cassandra Architect
Rekha JoshiStaff Engineer at Intuit
Inc.
1 About Intuit
2 Use Case: Personalized A/B Testing 3 Database Requirements
4 Cassandra: Intuit NoSQL Standard
5 Using Cassandra Effectively
4© 2015 DataStax, All Rights Reserved.
Intuit On Mission
© 2015 DataStax, All Rights Reserved. 5
Intuit Data Platforms
© 2015 DataStax, All Rights Reserved. 6
50M+manage all of the data
complex compliancePublic and private cloud
customers to handle6+
petabytes of data
45M+ Customers
Manage all of the data 6+ Petabytes of data
Complex compliance
Use Case: Personalized A/B Testing
© 2015 DataStax, All Rights Reserved. 7
Opinion-vs-Opinion Wars
Huge Investment
Angry Customer
Experiment, experiment, experiment!
Let Data Be The Decision Maker!
No Personalized A/B Testing?
With Personalized A/B Testing!!
Use Case: Personalized A/B Testing
© 2015 DataStax, All Rights Reserved. 8
To Continuously Improve User Experience, Data Is Better Than Guess!
Personalized A/B Testing Platform
© 2015 DataStax, All Rights Reserved. 9
User Assignment
Personalization Service
Segmentation Filters and Sampling
Personalization Engine
Analytics
Set up and administration
Profile Store
User Actions
A/B Testing Service
Deployment
© 2015 DataStax, All Rights Reserved. 10
Monitoring
Alerting
Amazon CloudJenkinsCoopr ChefCloudformationECS/Docker
CloudwatchSplunkGraphiteGrafanaLogstashPrometheusNew Relic
SensuNew Relic AlertsHipchatPagerDuty
Database Requirements
© 2015 DataStax, All Rights Reserved. 11
• High Data Security• No Data Loss• No Downtime• Linear Scalability• Tunable Consistency• Performance Under Workloads
All This Data!!!!!
© 2015 DataStax, All Rights Reserved. 12
Can I Lift This Alone?
© 2015 DataStax, All Rights Reserved. 13
Need for Speed
© 2015 DataStax, All Rights Reserved. 14
Cassandra, Who?
© 2015 DataStax, All Rights Reserved. 15
Cassandra is a Java based NoSQL, linearly scalable, best in class tunable performance, fault tolerant, distributed, masterless, time series database.
Cassandra: The Hybrid Kid has the Edge!
© 2015 DataStax, All Rights Reserved. 16
DynamoDB(Amazon)
Big Table(Google)
Cassandra
Inherits data distribution Inherits data model
Masterless ArchitectureLinear Scalability Tunable Consistency/Performance
ApplicationQuery Access Patterns
influencing influencing
Cassandra and DataStax Enterprise
© 2015 DataStax, All Rights Reserved. 17
Advanced Security
Integrated Analytics (Spark)
Advanced Tools
24/7 Support
A Truly Successful Software
© 2015 DataStax, All Rights Reserved. 18
• Solves A Real Need• Is A Building Block for Platforms• Becomes Open Source• Gets Commercial Backing• Tools Ecosystem Builds Around It• Establishes Strong Users Base• Companies in Critical Domains use It!!
Database Options
© 2015 DataStax, All Rights Reserved. 19
Intuit and Cassandra
© 2015 DataStax, All Rights Reserved. 20
Cassandra = Intuit Technology Standard of Choice for NoSQL Distributed Database
High Data SecurityNo Data LossNo Downtime
Linear ScalabilityTunable ConsistencyOther NoSQL variants
Performance Under Workloads
Did You Use Cassandra Effectively?
© 2015 DataStax, All Rights Reserved. 21
Garbage Collection Issue
© 2015 DataStax, All Rights Reserved. 22
New objects created at faster rate, than they are GC’ed Can causes STOP-THE-WORLD GC pauses! •Configure Heap size, MAX_HEAP_SIZE•Set up GC logging CASSANDRA_HEAP_DIR•Configure CMS GC/G1GC•Automated Heap Dump•Upgrade System
Cassandra is a Java based NoSQL linearly scalable, fault tolerant, distributed time series database.
Clock Issue
© 2015 DataStax, All Rights Reserved. 23
Ensure when you move setups/do upgrades, the ntp server is set correctly
Cassandra is a NoSQL linearly scalable, fault tolerant, distributed time series database.
Understand the Node Ring
© 2015 DataStax, All Rights Reserved. 24
Repeat after me: Cassandra is a Java based NoSQL linearly scalable, best in class tunable performance, fault tolerant, distributed, masterless, time series database.
Nodetool statusNodetool ringNodetool infoNodetool cfstatsNodetool tpstats
What If A Node Goes Down?
© 2015 DataStax, All Rights Reserved. 25
ReplicationConsistencyNodetool repairNodetool decommissionNodetool snapshots
Cassandra is a NoSQL linearly scalable, fault tolerant, distributed, masterless time series database.
Tuning The Application
© 2015 DataStax, All Rights Reserved. 26
Cassandra is a Java based NoSQL linearly scalable, best in class tunable performance, fault tolerant, distributed, masterless, time series database.
Refactor data modelRevisit the usage access patternsParanoid Monitoring
Tuning For Reads
© 2015 DataStax, All Rights Reserved. 27
• Caching Layer – Key Cache/Row Cache• SSTable Compactions Frequency
• Multiple SSTable inefficient
Cassandra is a Java based NoSQL linearly scalable, best in class tunable performance, fault tolerant, distributed time series database.
Tuning For Writes
© 2015 DataStax, All Rights Reserved. 28
Cassandra is a Java based NoSQL linearly scalable, best in class tunable performance, fault tolerant, distributed time series database.
• Memtable – Fast Writes• CommitLog – Separate Dedicated Disk
Tuning the System
© 2015 DataStax, All Rights Reserved. 29
EXT4 Filesystem System Memory, CPU, DiskParanoid Monitoring
Cassandra is a NoSQL linearly scalable, fault tolerant, distributed, masterless time series database.
Little Talked Aspect Of The Pareto Principle!
© 2015 DataStax, All Rights Reserved. 30
Heavy Lifting? Easy!
© 2015 DataStax, All Rights Reserved. 31
© 2015 DataStax, All Rights Reserved. 32
Thank you!
Input questions at any timeusing the online interface
Q & A
https://www.linkedin.com/in/rekhajoshmhttps://twitter.com/rekhajoshm