Big data at CallFire

Big Data at CallFire

Vijesh Mehta (Co-Founder and CTO)

•  A little about CallFire

•  CallFire’s technical challenges

•  How CallFire deals with data

•  Summary

Agenda

•  I am one of the founders of CallFire. – Started in 2005 in a small apartment – Now 28 people – Bootstrapped and profitable

•  I’ve been writing software primarily in the Java space for 12 years. CallFire is all Java. – We use : Wicket, Guice, Hibernate, MySQL,

Cassandra, ActiveMQ, XEN, Puppet

Some background about myself

•  We are a cloud telephony provider. –  Outbound Phone calls –  Phone Numbers –  SMS through long and short codes –  IVR – Interactive Voice Response –  Power Dialing

•  CallFire’s call volume can get large very quickly. –  Hurricane Sandy : 1.9 million emergency calls

•  4 Engineers and 1 System admin managing operations and new features.

•  We just hired 7 more engineers this year, and still hiring!

About CallFire

•  1.4 billion calls and texts – Growing exponentially

•  Over 50,000 accounts •  Over 6 million campaigns •  80 million sound files •  14 TB in storage (NFS) •  MySQL : Over 10,000 qps at peak

Big data isn’t always big company problem!

Technical Challenges by Numbers

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

Campaigns over Time

Growing faster each day

The first challenge

•  Problem : We outgrew our datacenter. New systems need access to central storage. Replication across a 1gb/s interconnect.

•  Needed Solution: – Must work across datacenter – Must scale as demand increases – Must be fault tolerant – Must deal with over 80 million sound files – Cheaper the better

Solutions Considered (2010)

NFS GLUSTER HDFS CASSANDRA

Fault Tolerant Yes, if configured Yes Yes Yes

Datacenter Replica>on

Maybe. Rsync isn’t fun with lots of files.

Not at the Dme Yes Yes

Easy to add storage No Not at the Dme Yes Yes

No Single point of failure

No Yes Not exactly, NameNode.

Yes

Data always accessible easily

No, hard to sort through file systems.

No, same as a file system

Yes Yes

Notes Not working for us. Too much management and downDme.

Looks good, tried it for a while. Easy at first because it was a file system.

Didn’t like the name node issue. May have been a good way to go.

Everything we need, quick to learn. We went all in!

* Only LAN soluDons considered. Calls had too much latency in the cloud, or even across datacenter.

•  Storage isn’t the best use of Cassandra.

•  Do not exceed 50% of drive space. –  Compaction needs the space. Hard lesson learned.

•  Fault Tolerance: Replication factor of 3.

•  Result •  1 TB of data = 6 TB of storage needed! •  CallFire has a 74TB Cassandra Cluster

Cassandra

•  We like SQL and Hibernate. –  Pros: Easy, Flexible, Ad-Hoc Queries, Locks –  Cons: Scaling

•  Solution: Sharding with Cassandra for universal data

Extending the scope

Shard 1 Shard 2 Shard 3

Cassandra Cluster

•  Cassandra makes sharding easier – Easy to store universal data. (Authentication) – Performs very well

•  Tungsten Replicator (Big Data with SQL) – Sharding makes joins impossible, so fan your

data into central places. – NoSQL can’t handle ad-hoc queries. No

worries, you can still have SQL.

Sharding + Big Data

•  Not Just for big companies, data grows rapidly in todays environment. –  Nice article about Obama’s Data Crunchers: –  http://swampland.time.com/2012/11/07/inside-the-secret-world-of-quants-and-data-crunchers-who-helped-obama-win/

•  NoSQL systems have easier scaling and fault tolerance mechanisms. –  Not uncommon to see small teams with 10-20 node

clusters.

•  SQL is still a big part of the equation. (Tungsten) –  Fan in information across partitions –  Replicate across datacenters –  Keep your ad-hoc dreams alive!

Big Data Summary

Passive / Archived Storage

hUp://www.protocase.com/products/index.php?e=Backblaze

Backblaze – $5,300 for empty case. Holds 45 Drives (117TB usable space)

Date post:	05-Dec-2014
Category:	Technology
Upload:	vijesh-mehta
View:	1,282 times
Download:	0 times

Big data at CallFire

Technology