Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | daniel-gallagher |
View: | 219 times |
Download: | 0 times |
Probabilistic Machine Learning in Computational Advertising
Thore Graepel, Thomas Borchert, Ralf Herbrich and Joaquin Quiñonero Candela
Online Services and AdvertisingMicrosoft Research Cambridge, UK
NIPS 2009 – December 2009
Outline
• Online Advertising and Paid Search• AdPredictorTM: Predicting User Clicks on Ads
[Appendix]• Model shrinking• Parallel training
ONLINE ADVERTISING AND PAID SEARCH
Advertising Industry Business: Size
0
100
200
300
400
500
600
2001 2002 2003 2004 2005 2006
Year
Outdoor
Cinema
Radio
TV
Online
Annu
al E
xpen
ditu
re (i
n bi
llion
USD
)
GDP Denmark (2006)
Microsoft Revenue (2008)
Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007
Advertising Industry Business: Growth
-20.00%
-10.00%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
2001 2002 2003 2004 2005 2006
Year
Outdoor
Cinema
Radio
TV
Online
Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007Data: World Advertising Research Center Report 2007
$1.00$1.00
$2.00$2.00
$0.10$0.10
* 10%* 10%
* 4%* 4%
* 50%* 50%
=$0.10=$0.10
=$0.08=$0.08
=$0.05=$0.05
$0.80$0.80
$1.25$1.25
$0.05$0.05
Display to users (expected bid)Display to users (expected bid) Charge advertisers (per click)Charge advertisers (per click)
The Scale of Things
• Realistic training set for proof of concept: 7,000,000,000 impressions
• 2 weeks of CPU time during training: 2 wks × 7 days × 86,400 sec/day =
1,209,600 seconds• Learning algorithm speed requirement:– 5,787 impression updates / sec– 172.8 μs per impression update
ADPREDICTORBayesian Linear Probit Regression
Impression Level Predictions
One Weight per Feature Value102.34.12.201
15.70.165.9
221.98.2.187
92.154.3.86
Client IP
Exact Match
Broad Match
MatchType
Position
ML-1
SB-1
SB-2
++ pClick
Click Potential
Linear: click potential = sum of feature click contributions
click potentialclick potential00
PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2
ListingId = 798831ListingId = 798831
ClientIP = 98.0.101.23ClientIP = 98.0.101.23
clickclickclickclickno clickno clickno clickno click
Impression click potentialImpression click potential
Gaussian Noise
Probit: area under Gaussian tail as a function of click potential
click potentialclick potential00
Impression click potentialImpression click potential
clickclickclickclickno clickno clickno clickno click
P(click) = P(potential > 0)P(click) = P(potential > 0)
Probit
Probit: area under Gaussian tail as a function of click potential
click potentialclick potential00
Impression click potentialImpression click potential
100%100%
P(click|Impression)P(click|Impression)
click potentialclick potential00
PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2PageNumber/DisplayPosition/ReturnedAds = 0/ML-1/2
ListingId = 798831ListingId = 798831
ClientIP = 98.0.101.23ClientIP = 98.0.101.23
Modelling Uncertainty
click potentialclick potential00
Impression click potentialImpression click potential
Uncertainty about the Potential
click potentialclick potential00
Impression click potentialImpression click potential
Probability of Click
100%100%
Uncertainty: Bayesian Probabilities102.34.12.201
15.70.165.9
221.98.2.187
92.154.3.86
Client IP
Exact Match
Broad Match
MatchType
Position
ML-1
SB-1
SB-2
p(pClick)++
Principled Exploration
Training Algorithm in Action
w1
w1
w2
w2++
zz
cc Prediction Training/Update
Posterior Updates for the Click Event
Client IP: Mean & Variance
Calibrated Predictions
Joint Updates vs. Independent Aggregation
Naive Bayes
adPredictor Wrap Up
Thank [email protected]
APPENDIX
Dealing with Millions of Variables• Observation 1: Large variable bags follow a
power-law w.r.t. frequency of items• Observation 2: Weight posteriors of rare items
are close to their prior• Idea:
1. Initially, the belief of each new item is compactly represented by one (and the same) prior
2. After observing an item for the first time, the posterior is allocated
3. At regular intervals, all weight posteriors with a small deviation from the prior are removed
Naïve Approach – Shared Memory
• Does not scale– Constant contention for locks– Some features are very frequent– Synchronization issues
Training Node 1 Training Node 2
Impression A Impression B
MSNH1110.0.0.1 USA(etc) MSNH11 Canada 10.0.1.25 (etc)
ModelFile
Conflict!
UpdateUpdate
UpdateUpdate
Upda
te
Upda
te
Upda
te
Upd
ate
Proposal: Approximate LearningTrain Node 1 Train Node 2
Impression A Impression B
MSNH1110.0.0.1 USA(etc) MSNH11 Canada 10.0.1.25 (etc)
Upda
te UpdateUpdate
Update
Update
Merge Deltas
Updat
e
Update
Update
Final Model File