Efficient Approximate Thompson Sampling
for Search Query Recommendation
Chu-Cheng Hsieh
1
2
Nov.
13
Dec.
13
Jan.
13
Feb.
13
Training Date
Training Data
4
Resource is limited
5
Before vs. After Xmas
6
Multi-arm bandit
Problem (MAB)Thompson Sampling
Query recommendation
Experiments
The k-arm Bandit Problem
7
A
B
C
8
Play the right slot machine
that maximize your profit
#Goal#
9
1. 30% (play each evenly)
2. 70% play the best one
#Simple Strategy#
10
Related Search => MAB problem (M=1)
A B C D E
11
Multi-arm bandit Problem
Thompson (TS)
SamplingQuery recommendation
Experiments
12
Exploration
(learn)
(earn)
Exploitation
x = random.random() #0<=x<1
y = 0.49
if (x < y):
return true
else:
return false
(Observed)
13
(Estimate)
I played 10 times -- win 5 and lose 5
I played 100 times -- win 45 and lose 55
#Learn#
Chance o
f havin
g μ
μ
``prior’’
14
Data Observed
Action (Machine)
(Refienments being displayed)
Reward (Win/Loss)(CTR, BBOWA, ...)
15
#Question#
Learn or Earn?
Red
19 win
9 lose
Blue
59 win
39 lose
16
Thompson Sampling
Selecting a candidate based on the following formula:
Assuming:
0.0 0.2 0.4 0.6 0.8 1.0
0.6
0.8
1.0
1.2
1.4
x
PD
F
17
Beta(0+1,0+1)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
x
PD
F
18
Beta(5+1,5+1)
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
x
PD
F
19
Beta(45+1,55+1)
20
#Example#
Learn or Earn?
Red
19 win
9 lose
Blue
59 win
39 lose
75% 25%
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
x
PD
F
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
x
PD
F
The motivation of Thompson-S (2)
21
Beta(20,10)
Beta(60,40)
See a
good one;
“learn more”
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
x
PD
F
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
x
PD
F
Intuition
(Underdog, but worth to learn)
22
Beta(4,6)
Beta(60,40)
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
x
PD
F
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
x
PD
F
The motivation of Thompson-S (1)
23
Beta(10,15)
Beta(60,40)
avoid exploring
“low potential” arm
early on
Intuition (Equal exploration)
24 0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
x
PD
F
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
x
PD
F
Beta(40,60) Beta(60,40)
25
Init: a=1, b=1, Sx=Fx=0 for all x
each arm corresponds to Beta(Sx+a, Fx+b) prior
1. Draw a random number from each arm
based on Beta(Sx+a, Fx+b)
2. Play the arm (x’) with the highest number
3. If (see a reward)
Sx’ += 1
else
Fx’ += 1
Algorithm
27
Multi-arm bandit Problem
Thompson Sampling
Query RecommendationExperiments
28
Ugly truth #1
M > 1 (M=5 here)
Ugly truth # 2
No response => failure ?
32
Multi-arm bandit Problem
Thompson Sampling
Query Recommendation
Experiments
33
#Experiments#
Target: popular 100 (queries)
Date: 2 weeks (Nov. 2013)
Goal: identify top M
Measurement: Regret
->
Best
->
Picked
34
M=1M=2
#Goal#
Identify top M quickly.
35
M=1,2,3 and γ=0.02
37 Chu-Cheng Hsieh
38
Hsieh, C. Neufeld, J. Holloway King, T. Cho, J.J.
Efficient Approximate Thompson Sampling for
Search Query Recommendation The 30th
ACM/SIGAPP Symposium On Applied Computing
(SAC 2015)
About the author & download:
http://oak.cs.ucla.edu/~chucheng/