+ All Categories
Home > Technology > Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Date post: 11-Apr-2017
Category:
Upload: mlconf
View: 326 times
Download: 1 times
Share this document with a friend
24
Bayesian Bandits Byron Galbraith, PhD Cofounder / Chief Data Scientist, Talla 2017.03.24
Transcript
Page 1: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian BanditsByron Galbraith, PhD

Cofounder / Chief Data Scientist, Talla2017.03.24

Page 2: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian Bandits for the Impatient

Online adaptive learning: โ€œEarn while you Learnโ€1

2

3

Powerful alternative to A/B testing optimization

Can be efficient and easy to implement

Page 3: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Dining Ware VR Experiences on Demand

Page 4: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Dining Ware VR Experiences on Demand

Page 5: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Iterated Decision Problems

What product recommendations should we present to subscribers to keep them engaged?

Page 6: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

A/B Testing

Page 7: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Exploit vs Explore - What should we do?Choose what seems best so far๐Ÿ™‚ Feel good about our decision๐Ÿค” There still may be something better

Try something new๐Ÿ˜„ Discover a superior approach๐Ÿ˜ง Regret our choice

Page 8: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

A/B/n Testing

Page 9: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Regret - What did that experiment cost us?

Page 10: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

The Multi-Armed Bandit Problem

http://blog.yhat.com/posts/the-beer-bandit.html

Page 11: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bandit Solutions

๐‘…๐‘‡=โˆ‘๐‘ก=1

๐‘‡

[๐‘Ÿ (๐‘Œ ๐‘ก (๐‘Žโˆ— ))โˆ’๐‘Ÿ (๐‘Œ ๐‘ก (๐‘Ž๐‘ก )) ]

k-MAB =

๐‘Ž๐‘ก=argmax๐‘– [๐‘Ÿ ๐‘–๐‘ก+

๐‘โˆš log ๐‘ก๐‘›๐‘– ]

๐‘ƒ (๐ด๐‘ก=๐‘Ž )= ๐‘’h ๐‘Ž๐‘›

โˆ‘๐‘=1

๐‘˜

๐‘’h๐‘๐‘›=๐œ‹๐‘ก (๐‘Ž)

๐‘ƒ (๐‘‹=๐‘ฅ )=๐‘ฅ๐›ผโˆ’1 (1โˆ’๐‘ฅ )๐›ฝโˆ’ 1

๐ต (๐›ผ , ๐›ฝ )๐‘ƒ (๐‘‹=๐‘ฅ )=(๐‘›๐‘ฅ )๐‘๐‘ฅ (1โˆ’๐‘ )๐‘›โˆ’๐‘ฅ

๐ต๐‘’๐‘ก๐‘Ž๐‘Ž(๐›ผ+๐‘Ÿ๐‘Ž , ๐›ฝ+๐‘โˆ’๐‘Ÿ ๐‘Ž)

๐‘ƒ (๐‘‹|๐‘Œ ,๐‘ )= ๐‘ƒ (๐‘Œ|๐‘‹ ,๐‘ )๐‘ƒ ( ๐‘‹|๐‘ )๐‘ƒ (๐‘Œ|๐‘ )

Page 12: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Thompson Sampling

๐‘ท (๐œฝ|๐’“ ,๐’‚ )โˆ๐‘ท (๐’“|๐œฝ ,๐’‚) ๐‘ท (๐œฝโˆจ๐’‚ )PriorLikeliho

odPosterior

Page 13: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian Bandits โ€“ The ModelModel if a recommendation will result in user engagement

โ€ข Bernoulli distribution: - likelihood of event occurring

How do we find ?โ€ข Conjugate priorโ€ข Beta distribution: - number of hits, - number of misses๐›ผ ๐›ฝ

Only need to keep track of two numbers per optionโ€ข # of hits, # of misses

Page 14: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian Bandits โ€“ The Algorithm1. Initialize (uniform prior)

2. For each user request for recommendations t1. Sample 2. Choose action corresponding to largest 3. Observe reward 4. Update

Page 15: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 16: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 17: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 18: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 19: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Belief Adaptation

Page 20: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bandit Regret

Page 21: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

But behavior is dependent on contextโ€ข Categorical contextsโ€ข One bandit model per categoryโ€ข One-hot context vector

โ€ข Real-valued contextsโ€ข Can capture interrelatedness of context dimensionsโ€ข More difficult to incorporate effectively

Page 22: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

So why would I ever A/B test again?Test intent

Optimization vs understanding

Difficulty with non-stationarityMonday vs Friday behavior

DeploymentFew turnkey optionsSpecialized skill set

https://vwo.com/blog/multi-armed-bandit-algorithm/

Page 23: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Bayesian Bandits for the PatientThompson Sampling balances exploitation & exploration while minimizing decision regret

1

2

3

No need to pre-specify decision splits, time horizon for experiments

Can model a variety of problems and complex interactions

Page 24: Byron Galbraith, Chief Data Scientist, Talla, at MLconf NYC 2017

Resourceshttps://github.com/bgalbraith/bandits


Recommended