+ All Categories
Home > Technology > Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple...

Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple...

Date post: 11-Apr-2017
Category:
Upload: jkomiyama
View: 2,101 times
Download: 0 times
Share this document with a friend
25
Transcript
Page 1: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
Page 2: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
Page 3: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

( ) ( )

Page 4: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

: ๐พ

๐‘ก = 1,2, โ€ฆ , ๐‘‡

๐ผ(๐‘ก) โˆˆ

{1, . . , ๐พ}

๐‘‹๐ผ ๐‘ก ๐‘ก

๐‘ก=1๐‘‡ ๐‘‹๐ผ ๐‘ก (๐‘ก)

(image from

http://www.directgamesroom.com )

arm

Page 5: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
Page 6: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

Bernoulli: 1= ,

0= )

Page 7: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

๐‘– ๐œˆ๐‘–

๐‘‹๐ผ ๐‘ก ๐‘ก โˆผ ๐œˆ๐ผ(๐‘ก)

๐œˆ๐‘–

Bernoulli(๐œ‡๐‘–)

{๐œ‡๐‘–

Page 8: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

๐œ‡๐‘–๐œ‡1 > ๐œ‡2 > ๐œ‡3 > โ‹ฏ > ๐œ‡๐พ

{๐œ‡๐‘–}๐‘–โˆˆ[๐พ]

๐œ‡1 ๐‘‡ ๐œ‡1๐‘‡

Page 9: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

๐œ‡1, โ€ฆ . , ๐œ‡๐พ

๐œ‡๐‘–argmaxi ๐œ‡๐‘–

argmaxi ๐œ‡๐‘– = argmaxi๐œ‡๐‘– =: ๐œ‡1

Page 10: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

๐œ‡1

Regret ๐‘‡ = ๐œ‡1๐‘‡ โˆ’

๐‘–

๐พ

๐œ‡๐‘–๐‘๐‘‡ (๐‘–)

๐‘๐‘‡(๐‘–) ๐‘‡๐‘–

๐‘– ๐œ‡1 โˆ’ ๐œ‡๐‘– ๐ธ Regret ๐‘‡

๐ธ[๐‘๐‘–(๐‘‡)]

Page 11: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

โ€ข

โ€ข

โ€ข

โ€ข

โ€ข

โˆˆ

Page 12: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
Page 13: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

2

Page 14: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
Page 15: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

: ๐พ L (< ๐พ): ๐‘‡

๐‘ก L ๐ผ(๐‘ก){๐‘‹๐‘– ๐‘ก } (๐‘– โˆˆ ๐ผ ๐‘ก ) .

๐‘‹๐‘– ๐‘ก โˆผ ๐ต๐‘’๐‘Ÿ๐‘›๐‘œ๐‘ข๐‘™๐‘™๐‘–(๐œ‡๐‘–)

Regret(T) =

๐‘ก=1

๐‘‡

๐‘–โˆˆ ๐ฟ

๐œ‡๐‘– ๐‘ก โˆ’

๐‘–โˆˆ๐ผ ๐‘ก

๐œ‡๐‘– ๐‘ก

{๐ฟ + 1, ๐ฟ + 2,โ€ฆ , ๐พ}๐ผ ๐‘ก = {1,โ€ฆ , ๐ฟ}

Page 16: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

ๅ˜ๆ•ฐ้ธๆŠžใงๆœ€้ฉ ่ค‡ๆ•ฐ้ธๆŠžใงๆœ€้ฉ

Page 17: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

ๅ˜ๆ•ฐ้ธๆŠžใงๆœ€้ฉ ่ค‡ๆ•ฐ้ธๆŠžใงๆœ€้ฉ

ๆœฌ็ ”็ฉถ

Page 18: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

Regret ๐‘‡ โ‰ฅ

๐‘–โˆˆ{๐ฟ+1,โ€ฆ,๐พ}

๐œ‡๐ฟ โˆ’ ๐œ‡๐‘– log ๐‘‡

๐ท๐พ๐ฟ ๐œ‡๐‘– , ๐œ‡๐ฟโˆ’ ๐‘œ log ๐‘‡

3

L-2

L-1

i>L

j>L

๐ผ(๐‘ก)

2

L

Page 19: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
Page 20: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

โ€ข

โ€ข

Page 21: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays
Page 22: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

๐‘–

๐›ผ๐‘–(1) = 1, ๐›ฝ๐‘–(1) = 1

๐œƒ๐‘–(๐‘ก) โˆผ Beta(๐›ผ๐‘– ๐‘ก , ๐›ฝ๐‘–(๐‘ก)) ๐ผ ๐‘ก = ๐œƒ๐‘–(๐‘ก)

๐‘‹๐ผ ๐‘ก ๐‘ก ๐›ผ๐ผ(๐‘ก) ๐›ผ๐ผ(๐‘ก)(๐‘ก)

๐›ฝ๐ผ(๐‘ก) ๐›ฝ๐ผ(๐‘ก)(๐‘ก)

Page 23: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

๐œƒ๐‘–(๐‘ก) โˆผ ๐ต๐‘’๐‘ก๐‘Ž(๐›ผ๐‘–(๐‘ก), ๐›ฝ๐‘–(๐‘ก)) ๐ผ ๐‘ก =๐œƒ๐‘–(๐‘ก)

๐‘– โˆˆ ๐ผ ๐‘ก

๐‘‹๐‘– ๐‘ก ๐›ผ๐‘– ๐›ผ๐‘–(๐‘ก)๐›ฝ๐‘– ๐›ฝ๐‘–

Page 24: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

๐‘‚(log ๐‘ก

๐‘ก)

๐‘‚(log ๐‘ก

๐‘ก

2)

๐‘ก = 1,โ€ฆ , ๐‘‡ ๐‘‚(1)

Page 25: Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays

Recommended