Tuomas Rinta, Development Director Everyplay / Unity Technologies
FROM BIG DATA TO ACTIONABLE ANALYTICS
So what is ?
and numbers
• Live in about 1000 games across iOS and Android • Nearly 100 million game sessions recorded daily • About 2 billion events of usage data generated every week
Why do we care about big data?
• Mobile games, especially free-‐to-‐play, live and die by their metrics
• Providing a service for game developers must have proven value, and each opFmizaFon counts
So let’s talk about how we use big data, and how we got
started
Our goal “How do we create a metrics-‐driven product based on big data?”
This needs to be as quick as possible
Collect data
Analyze Create
A/B tests
Improve product
Challenges • We ship an SDK – and normal update cycle by clients can be as long as 6-‐12 months, not very dynamic – This conflicts with the fast improvement cycle – Technology must adapt to supporFng big data
• The product evolves constantly – AnalyFcs requirements change constantly
SDK is instrumented to send everything
the user does to the servers
Scribe
Amazon S3
Real-tim
e production system
Batch data processing
Apache Pig
Tackling evolving analy;cs
Issues with big data and analyFcs • AnalyFcs requirements change • RedshiS is based on PostgreSQL, so there needs to be a scheme – Schemes are the most restricFve factor with RedshiS
• How does that work with evolving analyFcs? • Everything would be easy if there weren’t billions of rows of data…
How should data be reported? • Choosing how the end-‐user instrumentaFon sends events is crucial
• Bad format of events can make analyFcs from big data nearly impossible
• You don’t always know before-‐hand what you need
Two possible approaches Separate events Example of video sharing: openVideoEditor trimButtonPressed undoTrimPressed activateFacecamRecording finishFacecamRecording shareButtonPressed • More flexible with a schema-
based database • Requires much more
data processing • Combining events can be
a hassle
Conversions with properties Example of video sharing: {event: “videoShareComplete”, {properties: [ {didTrimVideo: true}, {isVideoTrimmed: false}, {didUseFacecam: true}, {isFacecamEnabled: true}, {totalDuration: 1241} ] } } • Problematic with a schema-
based database • Easier and faster to process • All relevant data is pre-
aggregated
“What about Postgre and JSON?” • Yes, Postgre allows parsing of JSON documents which allows arbitrary format of event data
• However, when your data gets big, this comes with a warning…
Comparing querying fields and JSON Normal query: select count(*) from events where created > ‘2014-09-01’ and event_type=‘recordSessionClosed’; Vs. JSON-‐based: select count(*) from events where created > ‘2014-09-01’ and json_extract_path_text(event_json, ‘event_type’) = ‘recordSessionClosed’
Results
0
200
400
600
800
1000
1200
1400
Normal JSON
Execution time (in seconds)
So what’s the best soluFon? • Combining single-‐event sending with extra JSON-‐ properFes
• Querying the JSON-‐properFes is slow, so we store only informaFon that is not needed that much there (drill-‐down informaFon)
How do we then analyse the data? • Most on-‐the-‐market soluFons fell short due to
– Pricing – Features – Availability
• Turned out to be easier to “roll your own”
Solving an actual problem “What are the worst drop-‐off points for uploading a replay?”
Tools • SQL • JavaScript • Google Charts visualisaFon library
Why JavaScript for processing? • Dynamic, fast, relaFvely well-‐known • Excellent libraries for data visualisaFon
– Highcharts, Google Charts, D3.js, Dygraph • Good for visualizing data, but that’s it
Keys to a successful data-‐driven product • Plan ahead for analyFcs and leave room for an evolving product
• If metrics and analyFcs are not easily accessible by decision makers, they are worthless – self-‐updaFng dashboards are one of the main keys to success
• Build A/B tesFng and data-‐driven behaviour directly into your product, don’t hack it on later
Thank you! Questions, comments? Email: [email protected] Twitter: @trinta developers.everyplay.com
Q&A
THANK YOU