3
King in numbers356 million MAU1.5 billion game plays per day9 game studios, 1700 employees
And lots and lots of data...32 billion rows per day1.5 TB per day new> 9 Pb stored
A bit about King
Studios in Stockholm, London, Barcelona, Malmo, Berlin, Singapore and Seattle. Offices in San Francisco, New York, Malta, Tokyo, Seoul and Shanghai
And for fun:• 100000s of hours played• Trillions of candies matched
4
Activision Blizzard in numbersl Headquartered in Santa Monica, California
l 9000 employees
l Focused on games for Xbox, PS, Cmputer, etc
l Call of Duty, Guitar Hero, Diablo, Warcraft, etc
l Offices pretty much all over the US
A bit about Activision Blizzard
Players are different
356 m
We have more players than the entire US
320 m
Big data is…What is Big Data?
What's your definition of Big Data?
Big data is…What is Big Data?
We predict player behaviour…
Good stuff
EffectiveActionable Predictable
Our data is… growing
Our data
20130117T060000.142+0100 23102 1387107022 1137497977 0 0 fb notif giveGoldToUser20130117T060000.277+0100 2310101 1000524045 1 2 510720130117T060000.281+0100 2321 1025951084 0 134 135838885720130117T060000.282+0100 2369 1025951084 0 134 0 1358398800 facebook bookmark_favorites 0fb_source=bookmark_favorites&ref=bookmarks&count=3&fb_bmpos=9_320130117T060000.285+0100 2338 1025951084 ad1c792b WINDOWS_XP CHROME 24.0.1312.52 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.1720130117T060000.287+0100 2310101 1140113442 -1 4 510120130117T060000.288+0100 2310005 1140113442 4 3 135839880028820130117T060000.305+0100 2310005 1111576364 5 2 135839880030520130117T060000.306+0100 2310006 1031413225 7 13 0 0 8 1358398598520 -120130117T060000.350+0100 2310101 1151246251 -1 0 510120130117T060000.351+0100 2310005 1151246251 5 7 135839880035120130117T060000.358+0100 2310006 1376461814 4 3 0 0 72 1358398575940 -10001
Our data is… not that useful rawOur data
Game servers
Log server
ReportsData scientists
Data WarehouseTSV log files
Data MartRaw data
ETL
System architectureOur data
• Ease of use• Flexible framework• Huge bag of techniques & tricks• Structures thinking
Why build a dimensional model?Our data
…actually well structured
Our data is…Our data
20130117T060000.142+0100 23102 1387107022 1137497977 0 0 fb notif giveGoldToUser20130117T060000.277+0100 2310101 1000524045 1 2 510720130117T060000.281+0100 2321 1025951084 0 134 135838885720130117T060000.282+0100 2369 1025951084 0 134 0 1358398800 facebook bookmark_favorites 0fb_source=bookmark_favorites&ref=bookmarks&count=3&fb_bmpos=9_320130117T060000.285+0100 2338 1025951084 ad1c792b WINDOWS_XP CHROME 24.0.1312.52 Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.1720130117T060000.287+0100 2310101 1140113442 -1 4 510120130117T060000.288+0100 2310005 1140113442 4 3 135839880028820130117T060000.305+0100 2310005 1111576364 5 2 135839880030520130117T060000.306+0100 2310006 1031413225 7 13 0 0 8 1358398598520 -120130117T060000.350+0100 2310101 1151246251 -1 0 510120130117T060000.351+0100 2310005 1151246251 5 7 135839880035120130117T060000.358+0100 2310006 1376461814 4 3 0 0 72 1358398575940 -10001
TSVOur data
15
Hadoop strengths and weaknesses
Our data
Strengths Weaknesses
Scalability Structured data performance
Resiliency Ease of use
Flexibility Maintenance
Low cost accessible storage Fast data exploration
Unstructured / semi-structured data
JOINs
16
GamesEvent data
Hive
Reports
Data scientists
ETL
Data platform 1.0Our data
17
GamesEvent data
Hive DB?
Reports
Data scientists
ETL
Data platform 1.5Our data
18
• Optimised for structured data• Good for dimensional model• Fast data exploration• More friendly / productive environment• Faster queries = happier users!
Benefits of an column-oriented databaseOur data
19
• Speed• Efficiency• Tuning free• Scalability (170Tb and counting...)• ExaSol the company
Why ExaSolution?Our data
20
Database grade servers
Hadoop grade servers
Performance / price
Price / Tb usable storageOur data
0 x 2x
3x
4x
5x
6x
7x
21
Hybrid architecture: best of both worlds
Our data
Hadoop Analytics database
Scalability Structured data performance
Resiliency Ease of use
Flexibility Low maintenance
Low cost accessible storage Fast data exploration
Unstructured/semi-structured data JOINs
Data platform 2.0Our data
22
GamesEvent data
Hive ExaSolution
Reports
Data scientists
ETL
23
Cool! But…what kind of analysis can I do with that?
Our data
• Fairly deep thinking about the players and their motivation, frustration, achievements, persistence, etc
• Carefully designed experiments (AB tests) to run in the games, which integrate a hypothesis about player’s behaviour with a nicely designed game feature
• Continuing to introduce entirely new challenges as the levels unfold (Candy Crush Saga has 1,280 Reality levels and 665 Dreamworld levels)
• The right analysis
Machine learning and predictive analytics
24
We have >9 petabytes of player data. Mostly of the form:• “player ‘x’ tried level ‘y’ and succeeded / failed / spent”
A fairly large space of opportunity to predict…• Is this player going to stop playing?• Is this player going to start spending?• What product should I recommend to this player?• What other game might they enjoy?• Is it a good time to recommend they play another game?• But also segmentation, recommendation, etc
Candy Crush Saga has been at the top of the charts since January 2013
25
Candy Crush Saga: Can a level be too hard?
First Episode Unlock
Level 35
Level 65
Super hard level 65• 120+ attempts on average• 50% drop out rate• Very high revenue• Very high conversion• Super happy players when
they eventually complete itShould it be easier?
Machine learning and predictive analytics
27
The long term value of our players is higher if we make it easier• We get at significantly more direct revenue (all those future levels)• More players stay active in our network (=more players trying out
other games, more players helping & competing with their friends)At King we optimise for the long term!
28
Pet Rescue Saga. Which of these is better?
or ?
ClearSimpleObvious button to buyNo confusionLow price point
ComplexChoices to makeVaried price pointsChance for more revenue, but does it put people off?
29
Pet Rescue Saga. Which of these is better?
Results of a nice AB test:
Total revenue up significantly - driven almost entirely by our “medium” and “high” spend segments.No negative impact (zero/low spend segments are unaffected).andWe should think of how to target the zero spend and low spend segments in other ways.
or ?
• Upstream and downstream throughput and flexibility• Greater variety of game genres• Keep on scaling• Technology innovation• Evolving data model• Microbatch ETL• Real(er) time…
ChallengesWhere next?
31
Bridging the latency canyonWhere next?
32
Where next?
DataReal time system
ExaSolution Hadoop
Microbatch ETL
Increasing latency, quality, context
0 ms DailyHourly15
minutes?200 ms
Batch ETL
Data platform 4.0
VoltDB?
33
Where next?
In details
34
• Hadoop with 330+ nodes, adding 2 racks / month
• 32 Billion events per day, more than TwitterI. If an event had a weight of 1 gram, this would be as big as a 53 fully
loaded Airbus 380s.II. If an event was a grain of salt, this would mean about 30 bathtubs of salt.
• 64 Nodes in memory column store DB
• Hive, Impala, Spark, Yarn, in place
• 9PB of data in hdfs, 170TB+ in Exasol
• In 12 months time, these numbers will double
Some numbersWhere next?
35
• What are your requirements?
• There’s not one tool for the job
• Hybrid architectures give the best of more worlds
• 9PB of data opens up to a new set of challenges:
l A medium table in King has about 300 billion records;
l Having all that amount of data over that architecture allows you to do any kind of analysis you want, using the algorithm you want (NPL, AI, Machine learning, etc)
In conclusionWhere next?
A few words about our people
36
About 1700 employees today• Many 100s of software engineers• Lots of graphic designers, artists, musicians, business managers, producers,
marketers,…• In the data area:
60+ data scientists 30 data engineers building and maintaining our data and reporting platforms
Great roles
37
Data Scientists and Data Engineers working• in our games• on our network• on our systems• on our testing/optimisation frameworks• …And we like people to rotate around over time
Between 6 and 11 interviews before joining
https://www.youtube.com/watch?v=V9y21zPw4MY
Working @King
38
In the office, we have:l Unlimited food & drinks, gym, wine & whisky tasting, many different beersl Boxing, krav maga and yoga classesl Nap rooms, running clubsl Movie nights, boarding games, and football tournamentl Everyone's idea matter, no matter the seniorityl You get to travel as often as you likel You can work from homel Really cool parties & eventsl Freedom to work on what you likel You keep learning all the timel And much more...
Thank you