Forecasting supply in display advertising

DATA:CPH MEETUP VLAD [email protected] - Copenhagen, Denmark

TODAY

Adform

Our group and how we work

Forecasting website traffic problem in our context

Timeseries forecasting and stationarity

One real example

Silver bullet resources

ADFORM PLATFORM

+21000 +1000Advertisers Agencies +400 Publishers+600 Employees 16 Countries

WITH GLOBAL INFRASTRUCTURE

50bn 6Daily transactions Data centers 1m bid requests/sec

THE DATA

User visits website data collection

unique CookieId + visit timestamp geographical data from IP device data - type, OS, ISP browser data - type, language

user profile data from publisher = cookie data gender, income, work status, etc.

price user profile sells for on RTB (= Ads Stock Exchange) running contracts - matches campaign goals

RESEARCH SUPPLY GROUP

building data driven products for the publishers

traffic forecasting guaranteed delivery inventory availability delivery forecasting yield optimization using RTB (= Ads Stock Exchange)

pricing recommendations audience extension

OUR LITTLE CAVE

we love and prototype => simulate data or sample model validation => real data (as much of it) POCs + tips to scale them = happy dev team! Dev team research flows, fast data structures and scaling

scalability: (Scala)

deployment:

monitoring:

MODEL TO PRODUCTION

reuse the POC code or rewrite? run fully offline or online? how often do we need to refresh the model? how scalable is it, do we know its bounds? one model or multiple models? how can we sneak a new model unnoticed? what is our baseline? is it easy to tune the model? is the model code well separated from the rest? time to market -

“Anything is better than nothing” “We don’t need to think of everything upfront…it just needs to work in most cases”

… (regression through all these points!)

MODEL TO PRODUCTION

reuse the POC code or rewrite? run fully offline or online? how often do we need to refresh the model? how scalable is it, do we know its bounds? one model or multiple models? how can we sneak a new model unnoticed? what is our baseline? is it easy to tune the model? is the model code well separated from the rest? time to market -

“Anything is better than nothing” “We don’t need to think of everything upfront…it just needs to work in most cases”

… (regression through all these points!)

CONCRETE CASE: FORECASTING

how many mobile users from Copenhagen will visit my website next week? how about next Tuesday from 10 AM to 1 PM for this specific banner placement? how many English speaking users who are not using mobile (so tablet, desktop, tv, etc.) from Vesterbro will I get 3 weeks from now, between 8 PM and 9 PM?

why is this useful? if I know what users I can expect, I can better organize my inventory how much I can sell as guaranteed forecast delivery of impressions yield optimization using RTB markets

MULTI-DIMENSIONALITY AND CNF LOGIC

a user visit usually has many dimensions and each dimension has tens or hundreds of values (think of geo features - country, region, zip codes, etc.) device type (mobile, desktop, etc.), browser type, ISP, etc.

From DK, but not from CPH, using mobile, language English or Danish, using Firefox or Safari, not from Telia

DK /\ (not CPH) /\ mobile /\ (English \/ Danish) /\ (Firefox \/ Safari) /\ (not Telia)

Query: All users for Denmark and mobile ratio(Denmark) * ratio(mobile)

global forecast + adjusted by 0.5 x 0.5 = 0.25

SEGMENTS INDEPENDENCE ASSUMPTION

Denmark Germany France

0.250.250.5

mobile other

0.5 0.5

Query: All users for Denmark and mobile ratio(Denmark) * ratio(mobile)

global forecast + adjusted by 0.5 x 0.5 = 0.25 truly independent? can we do better?


Denmark Germany France

0.250.250.5

mobile other

0.5 0.5


Denmark Germany FranceMobile Mobile0.75

0.25

0.5 x 0.75 = 0.375 error = 0.375 - 0.25 = 0.125 (x 1M forecasted users) => 125,000 users offline pre computation of fractions of one segment relative to another is simply infeasible, way too many combinations (get a query -> parse it -> compute forecasts for the parsed query -> return forecasts) < 200ms


Denmark Germany FranceMobile Mobile0.75

0.25

TIMESERIES FORECASTING GIST

Timeseries = observations collected at constant time intervals Timeseries

time dependent => independence assumption of the observations does not hold seasonality trends => variations within specific time windows stationary time series

is one whose properties do not depend on the time at which the series is observed is easier to predict, since its statistical properties will be the same in the future as they are now.

TIMESERIES FORECASTING GIST

series mean should be constant

variance should be constant

covariance of i th term and (i + k) term does not depend on time

(*)

* analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling

http://analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling

LOOK AT THE DATA

0e+00

2e+05

4e+05

6e+05

Feb 04 Feb 06 Feb 08 Feb 10 Feb 12 Feb 14 Feb 16time

obsVisits

Time

Actual Forecasted

GETTING TO YOUR MODEL

which model should you choose? ETS, STL, ARIMA?

publishers have different time series patterns one case: we observed strong daily and weekly patterns, so we focused on a model which supports multiple seasonalities (TBATS) exploratory analysis on the time series correlations will (hopefully!) point you to the right model still hard to pinpoint which model works best, so you have to experiment with different types of models and see which gives the smallest error

RESOURCES

Forecasting Yoda = Hyndman and his silver bullet is https://www.otexts.org/fpp

fast, super basic, easy to read intro: analyticsvidhya.com/blog/2015/12/complete-tutorial-time-series-modeling

https://www.otexts.org/fpp



@adforminsider

Date post:	16-Mar-2018
Category:	Science
Upload:	vlad-sandulescu
View:	202 times
Download:	1 times

Forecasting supply in display advertising

Science