Date post: | 11-Apr-2017 |
Category: |
Technology |
Upload: | memsql |
View: | 151 times |
Download: | 2 times |
Building Real-timeAnalytics Engine with Kafka and Spark for Mobile Advertising
Mobile Advertising? - Social & Game
Authentic to Consumers Authentic to Entertainment
Authentic to Engagement
Mobile Games
eMarketer, “US Mobile Phone Content Usage Metrics, 2013-2019.” February, 2015.
People Spend A Lot of Time Gaming
3
Over 55 minutes a day on average is spent playing mobile games
Minutes Spent in Mobile
eMarketer, “US Mobile Phone Content Usage Metrics, 2013-2019.” February, 2015.
Innovate Advertising as Reward Ads
● Free-to-Play (Freemium) App● Only 2~5% users In-app-purchase● Publisher can give “reward” on users who engaged to Ads● Video + Game Economics + Reward
Mobile Video App Advertising
AdvertiserPay on Video-View
Pub Paid
Tapjoy Profit
User Earn Reward
Video to Install
Video Install
reward No reward
Video to Install to Event
Video Install
reward
No reward
Event
- Level N- Registration- In-app-purchase- First Booking
Mobile Video App Advertising - Data Science
Video Views
Installs
Early Retention
Life Time Value
“Event”
Look-alike Model
Real-timeBidding Engine
Advertiser’s Return
“Investment”
Building a Data Science Platform
Bigger in Scale
FasterServin
g
Smart and
Smarter !
Data Product
Tapjoy’s Data Platform
Algo Serving InfrastructureDatawarehousing
300,000 RPM throughput
Bidding & Targeting &
Personalization
<10 ms response time
20 TB daily addition2.3 PB DUM
Cloud & On-PremiseIn-house & SaaSBatch based & real-time
The Logic Stack
Data warehousing
HDFS / S3 / GSReporting
MPPs (BigQuery)
Algo Service
Batch + Streaming Hadoop / Spark
• Collect data, set rules• Reduce data friction• Improve signal-to-noise
ratio• Model training & iteration
• Deliver business insights• Driving data awareness
• Apply ideas to product (online)
• Serve model output• Drive revenue
Data Viz
A/B TestingData Viz
The Data Flow
Tapjoy’s Algo Service Engine (SOA)
● SOA (algo service) in Natty● 320, 000 lines of Java● 99% response time < 20 ms @ 200k - 400k RPM
Ad Request
A/B test classification
Main Algo & pre-filters
Apply Logic Pipe
Response (offer list)
Video BiddingTargetingPersonaLookalike
...
Biz logic filters
Algo Service’s Data Components
Component What’s in there Purpose
Kafka Raw activity logs Everything starts here
Spark Streaming ETL ETL & Algo feature updates
Aerospike User Big Table (User DNA) Real-time k-v lookups. I.e LookALike
MemSQL Striped down raw user activity data!!
● Device level real time aggregations
● Hot data sink ● Real time reporting
Elasticsearch Aggregates or Unstructured logs
Cube aggregates or fulltext search
Mobile Video App Advertising - Data Science
Video Views
Installs
Early Retention
Life Time Value
“event”
Look-alike Model
Real-timeBidding Engine
Advertiser’s Return
“Investment”
Big Table / MemCache
Use Case 1 - Ad-Request Level Decision
Video Bid
# CVR
Spending History
max(views) > T(n)
...
User app usages
Kafka+
Spark Streaming
S - App 1
S - App 2
S - App 3
S - App ..
S - App N
Lamda Batch
Use Case 1 - Ad-Request Level Decision
Video BidKafka
ORSpark
Streaming
S - App
RAW DATA
Use Case 1 - Ad-Request Level Decision High throughput low latency queries querying 30 days device
level data which are streamed into MemSQL.
Does the calculations on the fly and serving as decision features
Reference Join Subquery
Reference Join
In Fact - One Fits All
Algo Serving
KafkaOR
Spark Streaming
Real-Time Dashboard
Data Warehouse Hot Batch
Data Sink
HotBatch
RealtimeQuery
RealtimeQuery
eMarketer, “US Mobile Phone Content Usage Metrics, 2013-2019.” February, 2015.
Conclusion
20
❖ Mobile Advertising is all about knowing your audience❖ Fast & Accurate data is key to Data Science as Service❖ But, “Realtime” is a relative word❖ Try to simplify moving parts when it come to streaming
➢ Difficult to debug➢ Hard to backfill
❖ Generalized hot-data sink for stability and multi-purpose data storage
[email protected]@tapjoy.com