+ All Categories
Home > Technology > Big data at CallFire

Big data at CallFire

Date post: 05-Dec-2014
Category:
Upload: vijesh-mehta
View: 1,282 times
Download: 0 times
Share this document with a friend
Description:
 
13
Big Data at CallFire Vijesh Mehta (Co-Founder and CTO)
Transcript
Page 1: Big data at CallFire

Big Data at CallFire

Vijesh Mehta (Co-Founder and CTO)

Page 2: Big data at CallFire

•  A little about CallFire

•  CallFire’s technical challenges

•  How CallFire deals with data

•  Summary

Agenda

Page 3: Big data at CallFire

•  I am one of the founders of CallFire. – Started in 2005 in a small apartment – Now 28 people – Bootstrapped and profitable

•  I’ve been writing software primarily in the Java space for 12 years. CallFire is all Java. – We use : Wicket, Guice, Hibernate, MySQL,

Cassandra, ActiveMQ, XEN, Puppet

Some background about myself

Page 4: Big data at CallFire

•  We are a cloud telephony provider. –  Outbound Phone calls –  Phone Numbers –  SMS through long and short codes –  IVR – Interactive Voice Response –  Power Dialing

•  CallFire’s call volume can get large very quickly. –  Hurricane Sandy : 1.9 million emergency calls

•  4 Engineers and 1 System admin managing operations and new features.

•  We just hired 7 more engineers this year, and still hiring!

About CallFire

Page 5: Big data at CallFire

•  1.4 billion calls and texts – Growing exponentially

•  Over 50,000 accounts •  Over 6 million campaigns •  80 million sound files •  14 TB in storage (NFS) •  MySQL : Over 10,000 qps at peak

Big data isn’t always big company problem!

Technical Challenges by Numbers

Page 6: Big data at CallFire

0  

1000000  

2000000  

3000000  

4000000  

5000000  

6000000  

7000000  

Campaigns  over  Time  

Growing faster each day

Page 7: Big data at CallFire

The first challenge

•  Problem : We outgrew our datacenter. New systems need access to central storage. Replication across a 1gb/s interconnect.

•  Needed Solution: – Must work across datacenter – Must scale as demand increases – Must be fault tolerant – Must deal with over 80 million sound files – Cheaper the better

Page 8: Big data at CallFire

Solutions Considered (2010)

NFS   GLUSTER   HDFS   CASSANDRA  

Fault  Tolerant   Yes,  if  configured   Yes   Yes   Yes  

Datacenter  Replica>on  

Maybe.  Rsync  isn’t  fun  with  lots  of  files.  

Not  at  the  Dme   Yes   Yes  

Easy  to  add  storage   No   Not  at  the  Dme   Yes   Yes  

No  Single  point  of  failure  

No   Yes   Not  exactly,  NameNode.  

Yes  

Data  always  accessible  easily  

No,  hard  to  sort  through  file  systems.    

No,  same  as  a  file  system  

Yes   Yes  

Notes   Not  working  for  us.  Too  much  management  and  downDme.  

Looks  good,  tried  it  for  a  while.  Easy  at  first  because  it  was  a  file  system.  

Didn’t  like  the  name  node  issue.  May  have  been  a  good  way  to  go.  

Everything  we  need,  quick  to  learn.  We  went  all  in!  

*  Only  LAN  soluDons  considered.  Calls  had  too  much  latency  in  the  cloud,  or  even  across  datacenter.  

Page 9: Big data at CallFire

•  Storage isn’t the best use of Cassandra.

•  Do not exceed 50% of drive space. –  Compaction needs the space. Hard lesson learned.

•  Fault Tolerance: Replication factor of 3.

•  Result •  1 TB of data = 6 TB of storage needed! •  CallFire has a 74TB Cassandra Cluster

Cassandra

Page 10: Big data at CallFire

•  We like SQL and Hibernate. –  Pros: Easy, Flexible, Ad-Hoc Queries, Locks –  Cons: Scaling

•  Solution: Sharding with Cassandra for universal data

Extending the scope

Shard  1   Shard  2   Shard  3  

Cassandra  Cluster  

Page 11: Big data at CallFire

•  Cassandra makes sharding easier – Easy to store universal data. (Authentication) – Performs very well

•  Tungsten Replicator (Big Data with SQL) – Sharding makes joins impossible, so fan your

data into central places. – NoSQL can’t handle ad-hoc queries. No

worries, you can still have SQL.

Sharding + Big Data

Page 12: Big data at CallFire

•  Not Just for big companies, data grows rapidly in todays environment. –  Nice article about Obama’s Data Crunchers: –  http://swampland.time.com/2012/11/07/inside-the-secret-world-of-quants-and-data-crunchers-who-helped-obama-win/

•  NoSQL systems have easier scaling and fault tolerance mechanisms. –  Not uncommon to see small teams with 10-20 node

clusters.

•  SQL is still a big part of the equation. (Tungsten) –  Fan in information across partitions –  Replicate across datacenters –  Keep your ad-hoc dreams alive!

Big Data Summary

Page 13: Big data at CallFire

Passive / Archived Storage

hUp://www.protocase.com/products/index.php?e=Backblaze  

Backblaze  –  $5,300  for  empty  case.  Holds  45  Drives  (117TB  usable  space)  


Recommended