+ All Categories
Home > Technology > Webinar: 2 Billion Data Points Each Day

Webinar: 2 Billion Data Points Each Day

Date post: 26-Jan-2015
Category:
Upload: datastax
View: 107 times
Download: 0 times
Share this document with a friend
Description:
This webinar follows the process of evaluating different big data platforms based on varying use cases and business requirements, and explains how big data professionals can choose the right technology to transform their business. During this session, Ooyala CTO, Sean Knapp will discuss why Ooyala selected DataStax as the big data platform powering their business, and how they provide real-time video analytics that help media companies create deeply personalized viewing experiences for more than 1/4 of all Internet video viewers each month.
Popular Tags:
12
Data Points += 2 billion ... daily July 24th, 2013
Transcript
Page 1: Webinar: 2 Billion Data Points Each Day

Data Points += 2 billion

... dailyJuly 24th, 2013

Page 2: Webinar: 2 Billion Data Points Each Day

2

ABOUT ME

Sean Knapp(@seanknapp)

•Co-Founder, EVP & Chief Product Officer (formerly CTO)

•Senior Software Engineer @ Google•Built & launched iGoogle•Led Google’s Frontend Web Search and

Ads UX teams, who drove a $1B increase in revenue for Google in 18 months

•B.S. & M.S. in Computer Science from Stanford University

Page 3: Webinar: 2 Billion Data Points Each Day

33

Suite of products and services providing white-label management, hosting, and distribution of video online

Hundreds of customers including ESPN, Bloomberg, Disney, Miramax, Univision, Dell, Pac-12 Networks, and more

100M+ unique users streaming more than 1B videos monthly, generating more than 2B analytics events daily

280 employees located in Silicon Valley, NYC, London, Tokyo, Sydney, Singapore, Seoul & Guadalajara

OOYALA OVERVIEW

Page 4: Webinar: 2 Billion Data Points Each Day

4

EVOLVING INSIGHTS• Insights circa ’07• How many videos did I show this week?• What were my monthly uniques?

• Insights circa ’09• How many ad impressions did I receive

from users in each Designated Market Area (DMA)?

• Insights circa ’11• How many users do I have right now?

• Insights circa ’13• How does the revenue from iPad users

age 25-34 compare to those on XBox?

Weekly

Instant

Summary

Detailed

Complex

Page 5: Webinar: 2 Billion Data Points Each Day

5

BIG DATA @ OOYALA• 1st Gen (circa ’07)• Process: Hadoop MapReduce• Language: Ruby• Store: MySQL

• 2nd Gen (circa ’09)• Process: Hadoop MapReduce• Language: Ruby• Store: Cassandra 0.5+

• 3rd Gen (circa ’11)• Process: MapReduce, Storm• Language: Ruby, Scala• Store: DataStax Enterprise (300TB disk, 1TB

RAM)

• 4th Gen (circa ’13)• Process: MapReduce, Storm, Spark, Hive• Language: Scala• Store: DataStax Enterprise (1.5PB disk, 14TB

RAM)

Batch

Realtime

Summary

Granular

Queryable

Page 6: Webinar: 2 Billion Data Points Each Day

6

OUR GOALS

•Evolve our Analytics product from a time-delayed, static reporting system to a realtime, granular, and dynamic query engine

•Launch our Content Recommendation engine, an entirely new product offering

•Scale to billions of user events on a daily basis

•Support an ever expanding set of global customers

•Deliver a 5-9’s platform

Page 7: Webinar: 2 Billion Data Points Each Day

7

OUR CHALLENGES

•Very small ops team supporting global infrastructure•Not enough capacity for performance tuning•Routinely fell behind the latest releases•Didn’t know which releases were stable enough

•Unforeseen product requirements beyond the next 12 months

•Existing solution would have cost nearly $1M to scale to just 100TB

Page 8: Webinar: 2 Billion Data Points Each Day

8

SELECTION PROCESS

•Key Criteria•Scalability: PB+, 100k+ operations per second•Cost / price-performance•Availability: 5-9’s•Flexibility: schemaless

•Alternative Technologies•Other RDMS systems•HBase•Voldemort

Page 9: Webinar: 2 Billion Data Points Each Day

9

WHY CASSANDRA

•First learned about C* in Nov 2008•First deployed C* in Sep 2009

•Compelling Features•Scalability: PB+, high ops/sec, billions of rows and columns•Performance: designed specifically for heavy workloads similar to Ooyala’s•Cost: could run on commodity hardware•Availability: multi-datacenter with no single point of failure•Community: strong, unified direction

Page 10: Webinar: 2 Billion Data Points Each Day

10

RESULTS

•Business•Launched the next-gen of our Analytics in ’09 that solidified Ooyala as the leader in our industry•Launched our Content Recommendation engine in ’12 that again separated us from the industry

•Technical•1,000x the scale of just 5 years ago•Much higher ROI: 1PB+ for < $500k in hardware•No more 3am pager alerts

Page 11: Webinar: 2 Billion Data Points Each Day

11

Q&A

Page 12: Webinar: 2 Billion Data Points Each Day

THANK YOU


Recommended