Big Data + Social + Games @Is Cool
16/03/2012
TITRE DOCUMENT
Who is IsCool Entertainment?
Social game publisher based in Paris, France
#1 French publisher in terms of audience (450k Daily Active Users) & revenue
2.8 Millions Fans
80 employees
9.1 million € revenue in 2010
4 live applications on Facebook
Florian Douetteau CTO @fdouetteau
Agenda • What do we do? Social Gaming
• What kind of (Big) Analytics we do? Lots
• How we do it ? Hadoop, Python, R, Tableau, Geph and stuff…
Is Cool Games
IsCool, Delirious Collectible
Game
Absolute Solitaire, The best solitaire game available online
Temple Of Mahjong, Collect, Play, Exchange
Belote Multijoueur, Play, Win, Meet
Games & Virtual Goods
Play the Game & Gain some virtual goods
Play again & Gain more
Collaborate with other players & Gain More
….
Possibly buy To grow quicker To help others
Virtual Goods Virtual Economy
Virtual Goods Must not be too easy to get The game would not be fun ! No monetization
Virtual Goods must not be hard to get People would churn because of
frustration !
Virtual Goods can be usually traded between players
Virtual and actual “Price” of a good
Let’s Trade 1 Watch against
3 Hammers
Why is this Big Data ?
Number of object transactions per day NYSE 3,600,000,000
IsCool 2,150,000,000
Nasdaq 1,600,000,000
Nikkey 1,500,000,000
Footsie 860,000,000
CAC 40 142,500,000
9,8 TB Data to analyze
18 Million users generated actions per day 7 Billions per year.
The Real Big Data Challenge Collaborate for collective insights
data scientist?
what metrics?
Realtime?
Game Designer Perspective : Nice Charts ?
Programmers’ Perspective : Log Files & Work ?
Business Guy Perspective: Revenue Forecast ?
BI Veteran: Schema Definition ?
Specifics of Game Analytics
Virtual Goods We are the Factory AND the
Shop, and most of the products are free.
Social Networks Network effects are key
Games The product changes EVERY day ! Sudden wage of unexpected
players from Guatemala ! People try to cheat !
Use Case 1 : Understanding Users
1: Defining engagement
Tenure length
Visit frequency
Virality
Paying user conversion
ARPPU
Score
Use of feature A,B,C…
Key drivers??? Traffic
Case Study 1 - Segment User Behaviours
2: Describing engagement patterns: Running a segment analysis
Use Case 2 : Understanding Users as a whole
10 Million Nodes
Around 1 000 Billion Edges
How does the graph evolve in time ?
What are the communities?
Understanding Users as a Whole
A very large community
Some mid size communities
Lots of small clusters ((mostly 2 players)
Use Case 3 : Analyze Long Terms effect of a feature
16/03/2012 TITRE DOCUMENT
A/B Tests Some features can be A/B tested …and some cannot ! How to measure the uplift ?
Are players using the new feature… More engaged? Generate more virality ? etc….
Complexity Multiple variable to observe
(other features, history )
… How
over the last 3 years
• Tools changed
• Scale changed
• Focus Changed
Analyzing the Offer
• Online Analytics Platform
• Commercial / Open Source ETL
• Commercial BI Visualization Software
• Commercial / Open Source databases (column stores)
• …
What we learned
Diversity
• There's no Hadoop+R Magic (Expertise, Entry Costs, Maintenance)
• There’s no XYZ Magical Product
Relativity
• Windows / Linux ? Cloud or on-premise ?
• Do you have internal data mining experts (yes/no) ?
• Do you have internal scalability experts (yes/no) ?
• What is _real_ budget ? 0K ? 10K ? 100K ? 1000K ?
Superciality
• Ability to display is more important than the result.
Mixed Approach
SaaS Analytics Platforms For common, business metrics (virality,
traffic, engagement) Corporate Level Visibility Day-to-day
Internal Datawarehousing Detailed Business Metrics Virtual Economy Modeling Long term behaviours Business Level Visibility Week-to-Week
Datamining tools Ad-hoc analytics Graph Analytics
Datawarehouse for the Big Data era
Hadoop/Hive (through Amazon’s Elastric Map Reduce)
• Used to reduce the amount of information : 10 GB a day => 1GB a day
• High cost of development for "business" related processing
Open Source ETL (PyBabe)
• Pure Python ETL
• Good integration with AWS/ S3
• Easy to integrate in our development environment
Columnar Database (Infinidb, Open Source)
• Free (as beer)
• Good performance for analytics tasks on a few hundreds million lines ( SELECT … GROUP BY … ORDER … )
• Featured and limited performance compared to commercial Column Stores
Dashboarding (Tableau Software)
• +Direct connection to the database
• +Excel fan biz guy can use it with no training !
Questions ?