Date post: | 11-Apr-2017 |
Category: |
Technology |
Upload: | mlconf |
View: | 326 times |
Download: | 1 times |
Bayesian BanditsByron Galbraith, PhD
Cofounder / Chief Data Scientist, Talla2017.03.24
Bayesian Bandits for the Impatient
Online adaptive learning: โEarn while you Learnโ1
2
3
Powerful alternative to A/B testing optimization
Can be efficient and easy to implement
Dining Ware VR Experiences on Demand
Dining Ware VR Experiences on Demand
Iterated Decision Problems
What product recommendations should we present to subscribers to keep them engaged?
A/B Testing
Exploit vs Explore - What should we do?Choose what seems best so far๐ Feel good about our decision๐ค There still may be something better
Try something new๐ Discover a superior approach๐ง Regret our choice
A/B/n Testing
Regret - What did that experiment cost us?
The Multi-Armed Bandit Problem
http://blog.yhat.com/posts/the-beer-bandit.html
Bandit Solutions
๐ ๐=โ๐ก=1
๐
[๐ (๐ ๐ก (๐โ ))โ๐ (๐ ๐ก (๐๐ก )) ]
k-MAB =
๐๐ก=argmax๐ [๐ ๐๐ก+
๐โ log ๐ก๐๐ ]
๐ (๐ด๐ก=๐ )= ๐h ๐๐
โ๐=1
๐
๐h๐๐=๐๐ก (๐)
๐ (๐=๐ฅ )=๐ฅ๐ผโ1 (1โ๐ฅ )๐ฝโ 1
๐ต (๐ผ , ๐ฝ )๐ (๐=๐ฅ )=(๐๐ฅ )๐๐ฅ (1โ๐ )๐โ๐ฅ
๐ต๐๐ก๐๐(๐ผ+๐๐ , ๐ฝ+๐โ๐ ๐)
๐ (๐|๐ ,๐ )= ๐ (๐|๐ ,๐ )๐ ( ๐|๐ )๐ (๐|๐ )
Thompson Sampling
๐ท (๐ฝ|๐ ,๐ )โ๐ท (๐|๐ฝ ,๐) ๐ท (๐ฝโจ๐ )PriorLikeliho
odPosterior
Bayesian Bandits โ The ModelModel if a recommendation will result in user engagement
โข Bernoulli distribution: - likelihood of event occurring
How do we find ?โข Conjugate priorโข Beta distribution: - number of hits, - number of misses๐ผ ๐ฝ
Only need to keep track of two numbers per optionโข # of hits, # of misses
Bayesian Bandits โ The Algorithm1. Initialize (uniform prior)
2. For each user request for recommendations t1. Sample 2. Choose action corresponding to largest 3. Observe reward 4. Update
Belief Adaptation
Belief Adaptation
Belief Adaptation
Belief Adaptation
Belief Adaptation
Bandit Regret
But behavior is dependent on contextโข Categorical contextsโข One bandit model per categoryโข One-hot context vector
โข Real-valued contextsโข Can capture interrelatedness of context dimensionsโข More difficult to incorporate effectively
So why would I ever A/B test again?Test intent
Optimization vs understanding
Difficulty with non-stationarityMonday vs Friday behavior
DeploymentFew turnkey optionsSpecialized skill set
https://vwo.com/blog/multi-armed-bandit-algorithm/
Bayesian Bandits for the PatientThompson Sampling balances exploitation & exploration while minimizing decision regret
1
2
3
No need to pre-specify decision splits, time horizon for experiments
Can model a variety of problems and complex interactions
Resourceshttps://github.com/bgalbraith/bandits