Date post: | 22-Aug-2015 |
Category: |
Technology |
Upload: | planet-cassandra |
View: | 246 times |
Download: | 4 times |
About Me Joseph Streeky Manager, Search Framework Development ● Joined Careerbuilder in 2005 ● BS Computer Science - Georgia Tech ● Natural Language Processing - Columbia University ● Software Engineering for SaaS - University of California, Berkeley
About Me Joshua Smith
Database Administrator III [email protected]
● Joined Careerbuilder in 2011 ● Took over management of Cassandra in 2013 ● BS Computer Science - Georgia Tech
About Careerbuilder is the global leader in human capital solutions, helping companies target and attract their most important assets - their people. ● More than 22 Million unique visitors a month ● More than 300,000 employers post more than 1 million
jobs on Careerbuilder ● Careerbuilder operates in the United States, Europe,
Canada and Asia. Its sites, combined with partnership and acquisitions, give Careerbuilder a presence in more than 55 countries worldwide.
About Search @ • 1 million ac*ve jobs each month • 60 million ac*vely searchable resumes • 500 globally distributed search servers (in the U.S., Europe, & the cloud)
• Thousands of unique, dynamically generated search indexes • 1.5 billion search documents • 2-‐3 million searches an hour
Feeding Platform Requirements ● Volume Requirements
o 1000+ documents / second ● Able to scale linearly ● Highly available ● Easily able to deploy to multiple location
(Private datacenter vs AWS)
Feeding Steps ● Content Creation - Translate to Solr Indexing Format (we use XML) ● Shard - Determine Routing Rules related to this document ● Batch - Group together documents that have the same routing rules for
batch feeding ● Send - Send the batch to Solr ● Verify - Verify that Solr received the batch ● Reprocess - For any set of documents that failed during any step of
the process we place the document(s) here for reprocessing
Storage in Cassandra
● Two column families per pool o Initial data o Translated for Solr
● Both have a DocumentID as the key ● The initial data column family is the key and some
number of columns based on user data ● The Translated column family is just a DocumentID and
a single content field
Read and Wri6ng ● Quorum Read ● Quorum Write ● Needed for our specific use case, if we fail to read the
newest data we have to automated way to recover
Cassandra Ring Specs ● 3 Node ring test ● 21 Nodes Production
o 56 TB o Datastax Cassandra o Version 2.0.5 o Vnodes o RF = 3 o 4K write/s - 3K read/s
Cassandra Node Specs ● Dell R620 ● 2 x E5-2630 V2 ● 2.60 GHZ CPU ● 128 GB RAM ● 3 x 1.6TB SAS SSD in RAID 5
Compaction Fun ● Cassandra version 1.0.8 ● Single threaded compaction ● Eating up heap space until OOM
error ● JNA not installed ● Reduced memtable and cache size ● increased the heap size to 12 GB
Cold Read Problem ● Refeeding involves all documents ● Each row will be read multiple times ● Cold reads means lots of seeks ● Spinning disks HATE seeks
SSD ● Nightly repair times decreased from 23
hours to less than 3 hours ● Write latency decreased from 15 ms to 2.4
ms ● cassandra.yaml
o concurrent_reads: 96 o concurrent_writes: 192
AWS vs Private ● Combination of AMI and chef to configure ● R3.XLarge with EBS optimized
o 4 vCPU, 30 GB RAM, Provisioned IOPS ● RF = 3 ● 2 Availability zones for high availability and
local quorum ● Comparable performance to local datacenter ● Currently deploying version 2.0.12