Date post: | 09-May-2015 |
Category: |
Technology |
Upload: | rocket-fuel-inc |
View: | 1,194 times |
Download: | 1 times |
Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting @ Scale - How did we know that this Ad was relevant for you ?
Savin GoyalSivasankaran Chandrasekar
Proprietary & Confidential. Copyright © 2014.Proprietary & Confidential. Copyright © 2014.
ADVERTISER
ROCKET FUEL
200+RTB
advertisingsupply
partners
50+ MnWebsites
50+ BnDaily impressions
3B WW CONSUMERS
100,000+ DEVICES
Proprietary & Confidential. Copyright © 2014.
Exchanges
AdExchange
Rocket Fuel Platform
Auto Optimization
Real-Time Bidding
Agencies
Data Partners
Display Advertising Ecosystem
Proprietary & Confidential. Copyright © 2014.
Bid on Ad
User Data
Bid Request
Rocket Fuel Winning AdAd Request
Ad Served to User
Page RequestWeb Browser
Rocket Fuel Platform
Smart Ad Servers
Response Prediction
Models
1
8
2 7
Calculate Propensity Score
5User Engagement Recorded
9 User Engages with Ad
Publishers
Refresh learning
Campaign & Audience
Data
4
Qualify Campaign
10
3
6
Data Partners
Exchange Partners
Programmatic Buying
Proprietary & Confidential. Copyright © 2014.
$2.38965$0.6782$1.7234
$0.09$1.78964$1.6782$1.7234$0.809$2.421.25
$2.11$1.26
$2.178$2.056$0.809$2.421.25
$2.11$1.26$2.78$1.56
$1.809$2.421.25
$2.11$1.26$2.78$0.56$2.421.25
$2.11$1.26$2.78
$0.756$0.809$2.421.25
$2.11$1.26$2.78
$1.256$1.809$2.421.25
$2.11$1.26$2.78
$0.586$2.009
1.25$2.11$1.26$2.78$1.56
$0.00
Site/PageGeo/WeatherTime of DayBrand AffinityUser
[ + ][ + ]
Real Time Auction
Proprietary & Confidential. Copyright © 2014.
Goal:Leads& sales
Goal:Coupondownloads
Goal:Brandawareness
Site/PageGeo/WeatherTime of DayBrand AffinityDemo
Impression ScorecardDemoBrand AffinityTime of DayGeo/WeatherSite/PageAd PositionIn-marketBehaviorResponse
Impression ScorecardDemoBrand AffinityTime of DayGeo/WeatherSite/PageAd PositionIn-MarketBehaviorResponse
X
Impression ScorecardDemoBrand AffinityTime of DayGeo/WeatherSite/PageAd PositionIn-MarketBehaviorResponse
+100+40-20+20+15+10+40+35
+9.7%
+40-70-20+10+15-25-40
-18+0.7
%
+10-10-20+20+10-35-25+10
+1.4% X✓
Real Time Auction
Proprietary & Confidential. Copyright © 2014.
Scalable Predictive Models
Age/Gender
Occupation
IncomeEthnicity
Purchase Intent
OnlinePurchases
OfflinePurchases
BrowsingBehavior
Site Actions
Zip CodeCity/DMA
Search Sites
SearchCategories
Recency
Search Keywords
Web Site/Page
Referral URL
Site Category
Bizographics
Social
Interests Lifestyle
Positive Lift
Marginal Impact
Negative Lift
-7
+17
X
-2
+8
+14
X
-9
-13
-12
X
+19
+13
+11
X
+11
X
XX
+25
+6
X
-7 +17
-2
+28
X
+11
X
X
-9
+14
+17 +19
+8 +11
X
X
-9
+17
-23
+6
X
+17
-7
X
-2
-13
-12
X
+13
+6
+11
XX
X-9 X
+17
X
+19
+8
+14
+18
-23
+17
-12
+11
-9
+8 +14X
+11
-13
-12
+13
+11
X
X
-7
+17 +8
+18X
+11X -12-10
+6
+14
X
+8
+11-10+13
+28 +6
+13+19
X
+8
+11
-10
+13
-12
+17
X
-7
+8
X
Automated Feature Selection
▪ Infinite number of models
▪ Determine perfect model size
▪ Balance past data fit
and future generalization
Learn-Test-Refine
▪ Automatically learn from
each response
▪ Cross-validate - A / B testing
infrastructure
▪ Training pipeline
Proprietary & Confidential. Copyright © 2014.
Throughput
Proprietary & Confidential. Copyright © 2014.
Rocket Fuel Scale
▪ 34,474 CPU Processor Cores▪ 2655 servers▪ 187.4 Teraflops of computing
▪ 188 Terabytes of memory▪ 13X the memory of Jeopardy-
winning IBM Watson
▪ 42 Petabytes of storage▪ 106X the data volume of entire
Library of Congress
Proprietary & Confidential. Copyright © 2014.
200 Servers 1400 Servers
1 Year
5 PB
41 PB8x
Data Warehouse Growth
Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting
Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting
▪ Leverage online activities on the web to learn about user’s ▪ Long Term Interests
▪ User is interested in luxury cars▪ Short Term Interests
▪ User is looking for a pizza right now
▪ Expand user set beyond retargeting▪ Explore v/s Exploit
▪ Identify relevant users even if they have never been targeted previously
Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting @ Rocket Fuel
Label Data
Train Model
Back Test
Calibrate
TrainingEvents
Pixel Stream Ad Logs
BT Features (HBase)
Feature Generation
Score Profiles
Profile Generation
Scoring
Ad Serving Data Centers Model
Proprietary & Confidential. Copyright © 2014.
Hadoop/HBase @ Rocket Fuel
▪ Cluster Highlights▪ 650+ Slaves (64 GB + 12 *3 TB)▪ 20 PB Storage▪ HA Name Node Set Up▪ 9k Map Slots + 5.5k Reduce Slots▪ Co-located to run HBase for offline processing
▪ HBase 0.94.15▪ 5 Node ZooKeeper quorum▪ Monitoring with OpenTSDB▪ Dual Master Setup
Proprietary & Confidential. Copyright © 2014.
Behavioral Targeting @ Rocket Fuel
bmw.com 11:23
Cars 11:23
pizzahut.com 11:26
Food 11:26
honda.com 11:27
Cars 11:2730 minutes
honda.com
11:27 Recent 6 hours: 5
Between 6 and 12 hours: 3
Between 12 hours and …
Food 11:26 Recent 6 hours: 2
Between 6 and 12 hours: 7
Between 12 hours and …
Read events of last N days
Recency
Frequency
Others..Behavioral Targeting Profile
11:23 11:26 11:27
Proprietary & Confidential. Copyright © 2014.
HBase Data Model
11:23ABCD06EFG
2014060416:site:bmw.com 2014060416:category:food
11:26
row_key: user_id
Single Column Family “u”
Column Qualifier:<date><hour>:<type>:<value>
Cell Value: [Protobuf]Most recent timestamp, Event details relative to timestamp
Event details relative to 11:23 Event details relative to 11:26
• Efficient look up for a given user
• Access range of events by event date, hour and type
Proprietary & Confidential. Copyright © 2014.
Proprietary & Confidential. Copyright © 2014.
Key Challenges
User Profile Freshness Scaling Issues Pipeline Failures
Proprietary & Confidential. Copyright © 2014.
User Profile Freshness
▪ Strict latency requirements▪ Recent activity much better predictor
Solutions - ▪ Staggered Pipelines▪ Real Time Behavioral Targeting
Proprietary & Confidential. Copyright © 2014.
Staggered Pipelines
Extract Score Filter Upload
Extract Score Filter UploadSource Data
Extract Score Filter Upload
Extract Score Filter Upload
Extract Score Filter Upload
Proprietary & Confidential. Copyright © 2014.
Real Time Behavioral Targeting
Proprietary & Confidential. Copyright © 2014.
Batched Profile
Blackbird – HBase instance tuned for 2ms latencies
Refreshed every N hours
Real Time Behavioral Targeting
Offline BT Pipeline
BT Profile
Ad Servers Merge Profiles
Logs
Blackbird
Online Profile
Record events for users in real time
Request
Response
Proprietary & Confidential. Copyright © 2014.
Batched Updates vs. Real Time Updates
Event Granularity Aggregated over several hours/days
Raw recorded events appended for recent
N hours
Processing Load Requires minimal CPU processing
Needs aggregation on-the-fly
Disk FootprintCompact
representation captures several days
Strict limits to ensure read times are
acceptable
Coverage All interactions Only interactions at a data center
▪ Real Time Profile updated in milliseconds
▪ Batched Profile refreshed every N hours
Batched Profile Real Time Profile
Proprietary & Confidential. Copyright © 2014.
Scaling Issues
▪ 3X growth in events processed/year▪ First Party Data▪ App Interactions▪ Geo-location Data▪ …
▪ Case Studies▪ HBase Region Hot-spotting▪ Network Bandwidth Troubles
Proprietary & Confidential. Copyright © 2014.
HBase Region Hot Spotting
Proprietary & Confidential. Copyright © 2014.
HBase Region
HBase Region Hot-spotting
High Write Load
HBase Region
HBase Region
Region Split (painful!)
Some users more active than othersNo control on user id’s generated
Still problematic
Non-uniform
distribution!
Proprietary & Confidential. Copyright © 2014.
HBase Region Hot-spotting
▪ Uneven write-load distribution▪ Non-Uniform Row Key Distribution
▪ Salt row key’s to ensure uniform distribution▪ Fixed length hashed prefix▪
Murmur hash based prefix
Original User ID
▪ Uniform pre-splits
Proprietary & Confidential. Copyright © 2014.
HBase Region Hot-spotting
▪ Don’t stop at salting▪ Map input splits configured for region boundaries
Region 1\x03\x85\x1E\xB8ZZZZZZ
Region 2\x07\x5C\xF5\xC2928ZZ
Region m\xFF\xAE\x14\xE1Z28ZZ
12345571234568123457912345831234594
..
..
..
..ZZAHT654ZZZGT934ZZZZNGA2ZZZZKLO1
Key Partitioner
‘k’ splits ‘m’ regions‘m’ splits
\x01\x85\x1E\xB811ZKL1\x01\x86\x1E\xB8129542
..\x03\x85\x1E\xB8ZZZKL1
\x05\x35\x9E\x18087KL1\x06\x86\x1E\xB8AHV24
..\x07\x5C\xF5\xC16534Z
\xEB\x27\x92\x1508RKL1\xFE\x86\x1E\xB8AHV24
..\xFF\xAE\x14\x126534Z
Proprietary & Confidential. Copyright © 2014.
HBase Key Partitioner
▪ As many splits as regions to maximize parallelism
▪ Key Partitioner (MR) – ▪ Reads region boundaries of HBase table▪ Salts and sorts row key accordingly▪ Multiple Output Format to optimize reduce phase▪ Each generated split file corresponds to a single region
▪ Drastically reduces read latencies
Proprietary & Confidential. Copyright © 2014.
Network Bandwidth Troubles
Proprietary & Confidential. Copyright © 2014.
Data Center Expansion
Proprietary & Confidential. Copyright © 2014.
Network Bandwidth Constraints
▪ Consistently overshot bandwidth limit during uploads▪ All sorts of delays (Redis, MySQL, Blackbird…)▪ Bidding hampered
Proprietary & Confidential. Copyright © 2014.
Solutions
▪ Intelligent storage – protobufs everywhere
▪ Throttle writes
▪ Geo-splitting
Proprietary & Confidential. Copyright © 2014.
Geo Splitting
Proprietary & Confidential. Copyright © 2014.
Geo-splitting
▪ Tag user’s location history & predict future data center visits
▪ ⨍(dc, geo_history, bt_profile)
▪ A separate workflow periodically generates geo-split rules:▪ Clusters users & analyzes migration patterns▪ Ensures maximal look-up coverage of profiles▪ Minimizes total number of profiles stored
▪ Ensures efficient use of resources, with minimal impact on perf
Proprietary & Confidential. Copyright © 2014.
Geo-splitting
Label Data
Train Model
Back Test
Calibrate
TrainingEvents
Pixel Stream Ad Logs
BT Features (HBase)
Feature Generation
Score Profiles
Profile Generation
Scoring
Ad Serving Data Centers Model
Cluster Users
Analyze Patterns
Generate Rules
Geo-split
Proprietary & Confidential. Copyright © 2014.
Proprietary & Confidential. Copyright © 2014.
Quick Recovery From Failures
▪ Break pipeline into short payloads▪ Fail fast, recover fast!▪ Actionable alerts, cut down noise
Proprietary & Confidential. Copyright © 2014.
Quick Recovery From Failures
▪ Materialize data as frequently as possible▪ Cross system fault tolerance▪ Idempotency
▪ Backfill at EOD to plug holes if needed
Proprietary & Confidential. Copyright © 2014.
Shout-outs!
Proprietary & Confidential. Copyright © 2014.
Shout-outs!
Proprietary & Confidential. Copyright © 2014.
Shout-outs!
Proprietary & Confidential. Copyright © 2014.
Shout-outs!
Proprietary & Confidential. Copyright © 2014.
We Are Hiring!
Proprietary & Confidential. Copyright © 2014.
Proprietary & Confidential. Copyright © 2014.
Questions ?
Thank You!
Sivasankaran [email protected]
Savin [email protected]
Proprietary & Confidential. Copyright © 2014.
We are hiring! (as always)
http://rocketfuel.com/careers
[email protected]@rocketfuel.com
Proprietary & Confidential. Copyright © 2014.