+ All Categories
Home > Technology > How did you know this Ad will be relevant for me?!

How did you know this Ad will be relevant for me?!

Date post: 09-May-2015
Category:
Upload: rocket-fuel-inc
View: 1,194 times
Download: 1 times
Share this document with a friend
Description:
Predicting the most relevant ad at any point in time for every individual is how Rocket Fuel optimizes ROI for an advertiser. One of the factors influencing this prediction is a consumer's online interactions and behavioral profile. With more than 45 billion interactions being processed daily, this data runs into several Petabytes in our Hadoop warehouse. Running machine-learning algorithms and Artificial Intelligence on this vast scale requires many practical issues to be addressed. First, behavioral patterns are shortlived, so to accurately reflect the tendencies of a consumer, we need to curate and refresh his or her profiles as quickly as possible while avoiding multiple scans over the raw data and dealing with issues like transient system outages. Second, we must address the difficulty of building models utilizing behavioral profiles without overwhelming our Hadoop cluster. At this scale, frequent refreshes of several models can place an undue burden on even a thousand-node cluster. In this talk, we will dive into (a) the practical challenges involved in designing a highly scalable and efficient solution to build behavioral profiles using Hadoop framework and (b) techniques for ensuring reliability and availability of mission critical machine learning pipelines.
48
Proprietary & Confidential. Copyright © 2014. Behavioral Targeting @ Scale - How did we know that this Ad was relevant for you ? Savin Goyal Sivasankaran Chandrasekar
Transcript
Page 1: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Behavioral Targeting @ Scale - How did we know that this Ad was relevant for you ?

Savin GoyalSivasankaran Chandrasekar

Page 2: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.Proprietary & Confidential. Copyright © 2014.

ADVERTISER

ROCKET FUEL

200+RTB

advertisingsupply

partners

50+ MnWebsites

50+ BnDaily impressions

3B WW CONSUMERS

100,000+ DEVICES

Page 3: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Exchanges

AdExchange

Rocket Fuel Platform

Auto Optimization

Real-Time Bidding

Agencies

Data Partners

Display Advertising Ecosystem

Page 4: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Bid on Ad

User Data

Bid Request

Rocket Fuel Winning AdAd Request

Ad Served to User

Page RequestWeb Browser

Rocket Fuel Platform

Smart Ad Servers

Response Prediction

Models

1

8

2 7

Calculate Propensity Score

5User Engagement Recorded

9 User Engages with Ad

Publishers

Refresh learning

Campaign & Audience

Data

4

Qualify Campaign

10

3

6

Data Partners

Exchange Partners

Programmatic Buying

Page 5: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

$2.38965$0.6782$1.7234

$0.09$1.78964$1.6782$1.7234$0.809$2.421.25

$2.11$1.26

$2.178$2.056$0.809$2.421.25

$2.11$1.26$2.78$1.56

$1.809$2.421.25

$2.11$1.26$2.78$0.56$2.421.25

$2.11$1.26$2.78

$0.756$0.809$2.421.25

$2.11$1.26$2.78

$1.256$1.809$2.421.25

$2.11$1.26$2.78

$0.586$2.009

1.25$2.11$1.26$2.78$1.56

$0.00

Site/PageGeo/WeatherTime of DayBrand AffinityUser

[ + ][ + ]

Real Time Auction

Page 6: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Goal:Leads& sales

Goal:Coupondownloads

Goal:Brandawareness

Site/PageGeo/WeatherTime of DayBrand AffinityDemo

Impression ScorecardDemoBrand AffinityTime of DayGeo/WeatherSite/PageAd PositionIn-marketBehaviorResponse

Impression ScorecardDemoBrand AffinityTime of DayGeo/WeatherSite/PageAd PositionIn-MarketBehaviorResponse

X

Impression ScorecardDemoBrand AffinityTime of DayGeo/WeatherSite/PageAd PositionIn-MarketBehaviorResponse

+100+40-20+20+15+10+40+35

+9.7%

+40-70-20+10+15-25-40

-18+0.7

%

+10-10-20+20+10-35-25+10

+1.4% X✓

Real Time Auction

Page 7: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Scalable Predictive Models

Age/Gender

Occupation

IncomeEthnicity

Purchase Intent

OnlinePurchases

OfflinePurchases

BrowsingBehavior

Site Actions

Zip CodeCity/DMA

Search Sites

SearchCategories

Recency

Search Keywords

Web Site/Page

Referral URL

Site Category

Bizographics

Social

Interests Lifestyle

Positive Lift

Marginal Impact

Negative Lift

-7

+17

X

-2

+8

+14

X

-9

-13

-12

X

+19

+13

+11

X

+11

X

XX

+25

+6

X

-7 +17

-2

+28

X

+11

X

X

-9

+14

+17 +19

+8 +11

X

X

-9

+17

-23

+6

X

+17

-7

X

-2

-13

-12

X

+13

+6

+11

XX

X-9 X

+17

X

+19

+8

+14

+18

-23

+17

-12

+11

-9

+8 +14X

+11

-13

-12

+13

+11

X

X

-7

+17 +8

+18X

+11X -12-10

+6

+14

X

+8

+11-10+13

+28 +6

+13+19

X

+8

+11

-10

+13

-12

+17

X

-7

+8

X

Automated Feature Selection

▪ Infinite number of models

▪ Determine perfect model size

▪ Balance past data fit

and future generalization

Learn-Test-Refine

▪ Automatically learn from

each response

▪ Cross-validate - A / B testing

infrastructure

▪ Training pipeline

Page 8: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Throughput

Page 9: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Rocket Fuel Scale

▪ 34,474 CPU Processor Cores▪ 2655 servers▪ 187.4 Teraflops of computing

▪ 188 Terabytes of memory▪ 13X the memory of Jeopardy-

winning IBM Watson

▪ 42 Petabytes of storage▪ 106X the data volume of entire

Library of Congress

Page 10: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

200 Servers 1400 Servers

1 Year

5 PB

41 PB8x

Data Warehouse Growth

Page 11: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Behavioral Targeting

Page 12: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Behavioral Targeting

▪ Leverage online activities on the web to learn about user’s ▪ Long Term Interests

▪ User is interested in luxury cars▪ Short Term Interests

▪ User is looking for a pizza right now

▪ Expand user set beyond retargeting▪ Explore v/s Exploit

▪ Identify relevant users even if they have never been targeted previously

Page 13: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Behavioral Targeting @ Rocket Fuel

Label Data

Train Model

Back Test

Calibrate

TrainingEvents

Pixel Stream Ad Logs

BT Features (HBase)

Feature Generation

Score Profiles

Profile Generation

Scoring

Ad Serving Data Centers Model

Page 14: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Hadoop/HBase @ Rocket Fuel

▪ Cluster Highlights▪ 650+ Slaves (64 GB + 12 *3 TB)▪ 20 PB Storage▪ HA Name Node Set Up▪ 9k Map Slots + 5.5k Reduce Slots▪ Co-located to run HBase for offline processing

▪ HBase 0.94.15▪ 5 Node ZooKeeper quorum▪ Monitoring with OpenTSDB▪ Dual Master Setup

Page 15: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Behavioral Targeting @ Rocket Fuel

bmw.com 11:23

Cars 11:23

pizzahut.com 11:26

Food 11:26

honda.com 11:27

Cars 11:2730 minutes

honda.com

11:27 Recent 6 hours: 5

Between 6 and 12 hours: 3

Between 12 hours and …

Food 11:26 Recent 6 hours: 2

Between 6 and 12 hours: 7

Between 12 hours and …

Read events of last N days

Recency

Frequency

Others..Behavioral Targeting Profile

11:23 11:26 11:27

Page 16: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

HBase Data Model

11:23ABCD06EFG

2014060416:site:bmw.com 2014060416:category:food

11:26

row_key: user_id

Single Column Family “u”

Column Qualifier:<date><hour>:<type>:<value>

Cell Value: [Protobuf]Most recent timestamp, Event details relative to timestamp

Event details relative to 11:23 Event details relative to 11:26

• Efficient look up for a given user

• Access range of events by event date, hour and type

Page 17: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Page 18: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Key Challenges

User Profile Freshness Scaling Issues Pipeline Failures

Page 19: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

User Profile Freshness

▪ Strict latency requirements▪ Recent activity much better predictor

Solutions - ▪ Staggered Pipelines▪ Real Time Behavioral Targeting

Page 20: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Staggered Pipelines

Extract Score Filter Upload

Extract Score Filter UploadSource Data

Extract Score Filter Upload

Extract Score Filter Upload

Extract Score Filter Upload

Page 21: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Real Time Behavioral Targeting

Page 22: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Batched Profile

Blackbird – HBase instance tuned for 2ms latencies

Refreshed every N hours

Real Time Behavioral Targeting

Offline BT Pipeline

BT Profile

Ad Servers Merge Profiles

Logs

Blackbird

Online Profile

Record events for users in real time

Request

Response

Page 23: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Batched Updates vs. Real Time Updates

Event Granularity Aggregated over several hours/days

Raw recorded events appended for recent

N hours

Processing Load Requires minimal CPU processing

Needs aggregation on-the-fly

Disk FootprintCompact

representation captures several days

Strict limits to ensure read times are

acceptable

Coverage All interactions Only interactions at a data center

▪ Real Time Profile updated in milliseconds

▪ Batched Profile refreshed every N hours

Batched Profile Real Time Profile

Page 24: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Scaling Issues

▪ 3X growth in events processed/year▪ First Party Data▪ App Interactions▪ Geo-location Data▪ …

▪ Case Studies▪ HBase Region Hot-spotting▪ Network Bandwidth Troubles

Page 25: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

HBase Region Hot Spotting

Page 26: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

HBase Region

HBase Region Hot-spotting

High Write Load

HBase Region

HBase Region

Region Split (painful!)

Some users more active than othersNo control on user id’s generated

Still problematic

Non-uniform

distribution!

Page 27: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

HBase Region Hot-spotting

▪ Uneven write-load distribution▪ Non-Uniform Row Key Distribution

▪ Salt row key’s to ensure uniform distribution▪ Fixed length hashed prefix▪

Murmur hash based prefix

Original User ID

▪ Uniform pre-splits

Page 28: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

HBase Region Hot-spotting

▪ Don’t stop at salting▪ Map input splits configured for region boundaries

Region 1\x03\x85\x1E\xB8ZZZZZZ

Region 2\x07\x5C\xF5\xC2928ZZ

Region m\xFF\xAE\x14\xE1Z28ZZ

12345571234568123457912345831234594

..

..

..

..ZZAHT654ZZZGT934ZZZZNGA2ZZZZKLO1

Key Partitioner

‘k’ splits ‘m’ regions‘m’ splits

\x01\x85\x1E\xB811ZKL1\x01\x86\x1E\xB8129542

..\x03\x85\x1E\xB8ZZZKL1

\x05\x35\x9E\x18087KL1\x06\x86\x1E\xB8AHV24

..\x07\x5C\xF5\xC16534Z

\xEB\x27\x92\x1508RKL1\xFE\x86\x1E\xB8AHV24

..\xFF\xAE\x14\x126534Z

Page 29: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

HBase Key Partitioner

▪ As many splits as regions to maximize parallelism

▪ Key Partitioner (MR) – ▪ Reads region boundaries of HBase table▪ Salts and sorts row key accordingly▪ Multiple Output Format to optimize reduce phase▪ Each generated split file corresponds to a single region

▪ Drastically reduces read latencies

Page 30: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Network Bandwidth Troubles

Page 31: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Data Center Expansion

Page 32: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Network Bandwidth Constraints

▪ Consistently overshot bandwidth limit during uploads▪ All sorts of delays (Redis, MySQL, Blackbird…)▪ Bidding hampered

Page 33: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Solutions

▪ Intelligent storage – protobufs everywhere

▪ Throttle writes

▪ Geo-splitting

Page 34: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Geo Splitting

Page 35: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Geo-splitting

▪ Tag user’s location history & predict future data center visits

▪ ⨍(dc, geo_history, bt_profile)

▪ A separate workflow periodically generates geo-split rules:▪ Clusters users & analyzes migration patterns▪ Ensures maximal look-up coverage of profiles▪ Minimizes total number of profiles stored

▪ Ensures efficient use of resources, with minimal impact on perf

Page 36: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Geo-splitting

Label Data

Train Model

Back Test

Calibrate

TrainingEvents

Pixel Stream Ad Logs

BT Features (HBase)

Feature Generation

Score Profiles

Profile Generation

Scoring

Ad Serving Data Centers Model

Cluster Users

Analyze Patterns

Generate Rules

Geo-split

Page 37: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Page 38: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Quick Recovery From Failures

▪ Break pipeline into short payloads▪ Fail fast, recover fast!▪ Actionable alerts, cut down noise

Page 39: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Quick Recovery From Failures

▪ Materialize data as frequently as possible▪ Cross system fault tolerance▪ Idempotency

▪ Backfill at EOD to plug holes if needed

Page 40: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Shout-outs!

Page 41: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Shout-outs!

Page 42: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Shout-outs!

Page 43: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Shout-outs!

Page 44: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

We Are Hiring!

Page 45: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Page 46: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

Questions ?

Thank You!

Sivasankaran [email protected]

Savin [email protected]

Page 47: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.

We are hiring! (as always)

http://rocketfuel.com/careers

[email protected]@rocketfuel.com

Page 48: How did you know this Ad will be relevant for me?!

Proprietary & Confidential. Copyright © 2014.


Recommended