+ All Categories
Home > Documents > Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click...

Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click...

Date post: 21-Mar-2020
Category:
Upload: others
View: 7 times
Download: 0 times
Share this document with a friend
105
Learning to Rank with Click Models: From Online Algorithms to Offline Evaluations Shuai LI The Chinese University of Hong Kong Shuai LI (CUHK) Learning to Rank 1 / 53
Transcript
Page 1: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Learning to Rank with Click Models: From OnlineAlgorithms to Offline Evaluations

Shuai LI

The Chinese University of Hong Kong

Shuai LI (CUHK) Learning to Rank 1 / 53

Page 2: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Outline

1 Motivation

2 Background

3 Problem Definition – Online

4 Click ModelsCascade Model (CM)

ICML’2016AAAI’2018IJCAI’2019

Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019

5 Offline Evaluations – KDD’2018

6 Conclusions

Shuai LI (CUHK) Learning to Rank 2 / 53

Page 3: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Outline

1 Motivation

2 Background

3 Problem Definition – Online

4 Click ModelsCascade Model (CM)

ICML’2016AAAI’2018IJCAI’2019

Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019

5 Offline Evaluations – KDD’2018

6 Conclusions

Shuai LI (CUHK) Learning to Rank 3 / 53

Page 4: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Motivation – Learning to Rank

Amazon, YouTube, Facebook, Netflix, TaobaoShuai LI (CUHK) Learning to Rank 4 / 53

Page 5: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Outline

1 Motivation

2 Background

3 Problem Definition – Online

4 Click ModelsCascade Model (CM)

ICML’2016AAAI’2018IJCAI’2019

Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019

5 Offline Evaluations – KDD’2018

6 Conclusions

Shuai LI (CUHK) Learning to Rank 5 / 53

Page 6: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Background – Multi-armed Bandit Problem

A special case of reinforcement learning

There are L arms

Each arm a has an unknown reward distribution with unknown mean αa

The best arm is a∗ = argmax αa

Shuai LI (CUHK) Learning to Rank 6 / 53

Page 7: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Background – Multi-armed Bandit Setting

At each time t

The learning agent selects one arm atObserve the reward Xat ,t

The objective is to minimize the regret in T rounds

R(T ) = Tα∗ − E

[T∑t=1

αat

]

Balance the trade-off between exploitation and exploration

Exploitation: select arms that yield good results so farExploration: select arms that have not been tried much before

Shuai LI (CUHK) Learning to Rank 7 / 53

Page 8: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Background – Multi-armed Bandit Setting

At each time t

The learning agent selects one arm atObserve the reward Xat ,t

The objective is to minimize the regret in T rounds

R(T ) = Tα∗ − E

[T∑t=1

αat

]

Balance the trade-off between exploitation and exploration

Exploitation: select arms that yield good results so farExploration: select arms that have not been tried much before

Shuai LI (CUHK) Learning to Rank 7 / 53

Page 9: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Background – Multi-armed Bandit Setting

At each time t

The learning agent selects one arm atObserve the reward Xat ,t

The objective is to minimize the regret in T rounds

R(T ) = Tα∗ − E

[T∑t=1

αat

]

Balance the trade-off between exploitation and exploration

Exploitation: select arms that yield good results so farExploration: select arms that have not been tried much before

Shuai LI (CUHK) Learning to Rank 7 / 53

Page 10: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Background – Upper Confidence Bound

UCB (Upper Confidence Bound) [ACF’02]

UCB policy: select

at = argmaxa αa,t +

√3 ln(t)

2Ta(t)

whereαa,t is the empirical mean of arm a in time t — ExploitationTa(t) is the played times of arm a — Exploration

Gap-dependent bound O( L∆ log(T )) where ∆ = minαa<α∗ α

∗ − αa,match lower boundGap-free bound O(

√LT log(T )) tight up to a factor of

√log(T )

Shuai LI (CUHK) Learning to Rank 8 / 53

Page 11: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Background – Upper Confidence Bound

UCB (Upper Confidence Bound) [ACF’02]

UCB policy: select

at = argmaxa αa,t +

√3 ln(t)

2Ta(t)

whereαa,t is the empirical mean of arm a in time t — ExploitationTa(t) is the played times of arm a — Exploration

Gap-dependent bound O( L∆ log(T )) where ∆ = minαa<α∗ α

∗ − αa,match lower boundGap-free bound O(

√LT log(T )) tight up to a factor of

√log(T )

Shuai LI (CUHK) Learning to Rank 8 / 53

Page 12: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Outline

1 Motivation

2 Background

3 Problem Definition – Online

4 Click ModelsCascade Model (CM)

ICML’2016AAAI’2018IJCAI’2019

Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019

5 Offline Evaluations – KDD’2018

6 Conclusions

Shuai LI (CUHK) Learning to Rank 9 / 53

Page 13: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank

There are L items

Each item a with an unknown attractiveness α(a)

There are K positions

At time t

The learning agent selects a list of items At = (at1, . . . , atK )

Receive the click feedback Ct ∈ 0, 1K

The objective is to minimize the regret over T rounds

R(T ) = T r(A∗)− E

[T∑t=1

r(At)

]

where

r(A) is the reward of list AA∗ = (1, 2, . . . ,K ) by assuming arms are ordered byα(1) ≥ α(2) ≥ · · · ≥ α(L)

Shuai LI (CUHK) Learning to Rank 10 / 53

Page 14: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank

There are L items

Each item a with an unknown attractiveness α(a)

There are K positions

At time t

The learning agent selects a list of items At = (at1, . . . , atK )

Receive the click feedback Ct ∈ 0, 1K

The objective is to minimize the regret over T rounds

R(T ) = T r(A∗)− E

[T∑t=1

r(At)

]

where

r(A) is the reward of list AA∗ = (1, 2, . . . ,K ) by assuming arms are ordered byα(1) ≥ α(2) ≥ · · · ≥ α(L)

Shuai LI (CUHK) Learning to Rank 10 / 53

Page 15: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank

There are L items

Each item a with an unknown attractiveness α(a)

There are K positions

At time t

The learning agent selects a list of items At = (at1, . . . , atK )

Receive the click feedback Ct ∈ 0, 1K

The objective is to minimize the regret over T rounds

R(T ) = T r(A∗)− E

[T∑t=1

r(At)

]

where

r(A) is the reward of list AA∗ = (1, 2, . . . ,K ) by assuming arms are ordered byα(1) ≥ α(2) ≥ · · · ≥ α(L)

Shuai LI (CUHK) Learning to Rank 10 / 53

Page 16: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Outline

1 Motivation

2 Background

3 Problem Definition – Online

4 Click ModelsCascade Model (CM)

ICML’2016AAAI’2018IJCAI’2019

Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019

5 Offline Evaluations – KDD’2018

6 Conclusions

Shuai LI (CUHK) Learning to Rank 11 / 53

Page 17: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contents

1 Motivation

2 Background

3 Problem Definition – Online

4 Click ModelsCascade Model (CM)

ICML’2016AAAI’2018IJCAI’2019

Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019

5 Offline Evaluations – KDD’2018

6 Conclusions

Shuai LI (CUHK) Learning to Rank 12 / 53

Page 18: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Click Models

Click models describe how users interact with a list ofitems

Cascade Model (CM)

Assumes the user checks the list from position 1 toposition K , clicks at the first satisfying item and stopsAt most 1 clickr(A) = 1−

∏Kk=1(1− α(ak)) = OR(α(a1), . . . , α(aK ))

The meaning of received feedback (0, 0, 1, 0, 0)

7

7

X

?

?

Click Model Regret

[KSWA, 2015] CM O( L∆ log(T ))

Shuai LI (CUHK) Learning to Rank 13 / 53

Page 19: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Click Models

Click models describe how users interact with a list ofitems

Cascade Model (CM)

Assumes the user checks the list from position 1 toposition K , clicks at the first satisfying item and stops

At most 1 clickr(A) = 1−

∏Kk=1(1− α(ak)) = OR(α(a1), . . . , α(aK ))

The meaning of received feedback (0, 0, 1, 0, 0)

7

7

X

?

?

Click Model Regret

[KSWA, 2015] CM O( L∆ log(T ))

Shuai LI (CUHK) Learning to Rank 13 / 53

Page 20: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Click Models

Click models describe how users interact with a list ofitems

Cascade Model (CM)

Assumes the user checks the list from position 1 toposition K , clicks at the first satisfying item and stopsAt most 1 click

r(A) = 1−∏K

k=1(1− α(ak)) = OR(α(a1), . . . , α(aK ))The meaning of received feedback (0, 0, 1, 0, 0)

7

7

X

?

?

Click Model Regret

[KSWA, 2015] CM O( L∆ log(T ))

Shuai LI (CUHK) Learning to Rank 13 / 53

Page 21: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Click Models

Click models describe how users interact with a list ofitems

Cascade Model (CM)

Assumes the user checks the list from position 1 toposition K , clicks at the first satisfying item and stopsAt most 1 clickr(A) = 1−

∏Kk=1(1− α(ak)) = OR(α(a1), . . . , α(aK ))

The meaning of received feedback (0, 0, 1, 0, 0)

7

7

X

?

?

Click Model Regret

[KSWA, 2015] CM O( L∆ log(T ))

Shuai LI (CUHK) Learning to Rank 13 / 53

Page 22: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Click Models

Click models describe how users interact with a list ofitems

Cascade Model (CM)

Assumes the user checks the list from position 1 toposition K , clicks at the first satisfying item and stopsAt most 1 clickr(A) = 1−

∏Kk=1(1− α(ak)) = OR(α(a1), . . . , α(aK ))

The meaning of received feedback (0, 0, 1, 0, 0)

7

7

X

?

?

Click Model Regret

[KSWA, 2015] CM O( L∆ log(T ))

Shuai LI (CUHK) Learning to Rank 13 / 53

Page 23: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Click Models

Click models describe how users interact with a list ofitems

Cascade Model (CM)

Assumes the user checks the list from position 1 toposition K , clicks at the first satisfying item and stopsAt most 1 clickr(A) = 1−

∏Kk=1(1− α(ak)) = OR(α(a1), . . . , α(aK ))

The meaning of received feedback (0, 0, 1, 0, 0)

7

7

X

?

?

Click Model Regret

[KSWA, 2015] CM O( L∆ log(T ))

Shuai LI (CUHK) Learning to Rank 13 / 53

Page 24: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Click Models

Click models describe how users interact with a list ofitems

Cascade Model (CM)

Assumes the user checks the list from position 1 toposition K , clicks at the first satisfying item and stopsAt most 1 clickr(A) = 1−

∏Kk=1(1− α(ak)) = OR(α(a1), . . . , α(aK ))

The meaning of received feedback (0, 0, 1, 0, 0)

7

7

X

?

?

Click Model Regret

[KSWA, 2015] CM O( L∆ log(T ))

Shuai LI (CUHK) Learning to Rank 13 / 53

Page 25: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Outline

1 Motivation

2 Background

3 Problem Definition – Online

4 Click ModelsCascade Model (CM)

ICML’2016AAAI’2018IJCAI’2019

Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019

5 Offline Evaluations – KDD’2018

6 Conclusions

Shuai LI (CUHK) Learning to Rank 14 / 53

Page 26: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contextual Bandit Setting

Contexts

User profiles, search keywordsImportant for search and recommendations

Assume each item a is represented by xt,a ∈ Rd

Assume the attractiveness for item a

αt(a) = θ>xt,a

by a fixed but unknown weight vector θ

When xt,a’s are one-hot representations, and θ = (α(1), . . . , α(L)), itreturns to multi-armed bandit setting.

Shuai LI (CUHK) Learning to Rank 15 / 53

Page 27: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contextual Bandit Setting

Contexts

User profiles, search keywordsImportant for search and recommendations

Assume each item a is represented by xt,a ∈ Rd

Assume the attractiveness for item a

αt(a) = θ>xt,a

by a fixed but unknown weight vector θ

When xt,a’s are one-hot representations, and θ = (α(1), . . . , α(L)), itreturns to multi-armed bandit setting.

Shuai LI (CUHK) Learning to Rank 15 / 53

Page 28: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contextual Bandit Setting

Contexts

User profiles, search keywordsImportant for search and recommendations

Assume each item a is represented by xt,a ∈ Rd

Assume the attractiveness for item a

αt(a) = θ>xt,a

by a fixed but unknown weight vector θ

When xt,a’s are one-hot representations, and θ = (α(1), . . . , α(L)), itreturns to multi-armed bandit setting.

Shuai LI (CUHK) Learning to Rank 15 / 53

Page 29: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contextual Bandit Setting

Contexts

User profiles, search keywordsImportant for search and recommendations

Assume each item a is represented by xt,a ∈ Rd

Assume the attractiveness for item a

αt(a) = θ>xt,a

by a fixed but unknown weight vector θ

When xt,a’s are one-hot representations, and θ = (α(1), . . . , α(L)), itreturns to multi-armed bandit setting.

Shuai LI (CUHK) Learning to Rank 15 / 53

Page 30: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Algorithm

C 3-UCB AlgorithmInitialization: θ = 0 ∈ Rd×1,V = λI ∈ Rd×d , b = 0 ∈ Rd×1

For time t = 1, 2, . . .Obtain items xt,aa∈E ⊂ Rd×1

With high probability ∥∥∥θ − θ∥∥∥V≤ βt

thus with high probability

αt(a) ∈ θ>xt,a ± βt ‖xt,a‖V−1

Select the list At by UCBs of arms Ut(a) = θ>xt,a + βt ‖xt,a‖V−1

Receive feedback Ct ∈ 0, 1KCompute the stopping position Kt = mink : Ct(k) = 1 ∪ K andupdate

V ← V +

Kt∑k=1

xt,atkx>t,at

k, b ← b +

Kt∑k=1

xt,atkCt(k)

θ = V−1b

Shuai LI (CUHK) Learning to Rank 16 / 53

Page 31: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Algorithm

C 3-UCB AlgorithmInitialization: θ = 0 ∈ Rd×1,V = λI ∈ Rd×d , b = 0 ∈ Rd×1

For time t = 1, 2, . . .

Obtain items xt,aa∈E ⊂ Rd×1

With high probability ∥∥∥θ − θ∥∥∥V≤ βt

thus with high probability

αt(a) ∈ θ>xt,a ± βt ‖xt,a‖V−1

Select the list At by UCBs of arms Ut(a) = θ>xt,a + βt ‖xt,a‖V−1

Receive feedback Ct ∈ 0, 1KCompute the stopping position Kt = mink : Ct(k) = 1 ∪ K andupdate

V ← V +

Kt∑k=1

xt,atkx>t,at

k, b ← b +

Kt∑k=1

xt,atkCt(k)

θ = V−1b

Shuai LI (CUHK) Learning to Rank 16 / 53

Page 32: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Algorithm

C 3-UCB AlgorithmInitialization: θ = 0 ∈ Rd×1,V = λI ∈ Rd×d , b = 0 ∈ Rd×1

For time t = 1, 2, . . .Obtain items xt,aa∈E ⊂ Rd×1

With high probability ∥∥∥θ − θ∥∥∥V≤ βt

thus with high probability

αt(a) ∈ θ>xt,a ± βt ‖xt,a‖V−1

Select the list At by UCBs of arms Ut(a) = θ>xt,a + βt ‖xt,a‖V−1

Receive feedback Ct ∈ 0, 1KCompute the stopping position Kt = mink : Ct(k) = 1 ∪ K andupdate

V ← V +

Kt∑k=1

xt,atkx>t,at

k, b ← b +

Kt∑k=1

xt,atkCt(k)

θ = V−1b

Shuai LI (CUHK) Learning to Rank 16 / 53

Page 33: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Algorithm

C 3-UCB AlgorithmInitialization: θ = 0 ∈ Rd×1,V = λI ∈ Rd×d , b = 0 ∈ Rd×1

For time t = 1, 2, . . .Obtain items xt,aa∈E ⊂ Rd×1

With high probability ∥∥∥θ − θ∥∥∥V≤ βt

thus with high probability

αt(a) ∈ θ>xt,a ± βt ‖xt,a‖V−1

Select the list At by UCBs of arms Ut(a) = θ>xt,a + βt ‖xt,a‖V−1

Receive feedback Ct ∈ 0, 1KCompute the stopping position Kt = mink : Ct(k) = 1 ∪ K andupdate

V ← V +

Kt∑k=1

xt,atkx>t,at

k, b ← b +

Kt∑k=1

xt,atkCt(k)

θ = V−1b

Shuai LI (CUHK) Learning to Rank 16 / 53

Page 34: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Algorithm

C 3-UCB AlgorithmInitialization: θ = 0 ∈ Rd×1,V = λI ∈ Rd×d , b = 0 ∈ Rd×1

For time t = 1, 2, . . .Obtain items xt,aa∈E ⊂ Rd×1

With high probability ∥∥∥θ − θ∥∥∥V≤ βt

thus with high probability

αt(a) ∈ θ>xt,a ± βt ‖xt,a‖V−1

Select the list At by UCBs of arms Ut(a) = θ>xt,a + βt ‖xt,a‖V−1

Receive feedback Ct ∈ 0, 1KCompute the stopping position Kt = mink : Ct(k) = 1 ∪ K andupdate

V ← V +

Kt∑k=1

xt,atkx>t,at

k, b ← b +

Kt∑k=1

xt,atkCt(k)

θ = V−1b

Shuai LI (CUHK) Learning to Rank 16 / 53

Page 35: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Algorithm

C 3-UCB AlgorithmInitialization: θ = 0 ∈ Rd×1,V = λI ∈ Rd×d , b = 0 ∈ Rd×1

For time t = 1, 2, . . .Obtain items xt,aa∈E ⊂ Rd×1

With high probability ∥∥∥θ − θ∥∥∥V≤ βt

thus with high probability

αt(a) ∈ θ>xt,a ± βt ‖xt,a‖V−1

Select the list At by UCBs of arms Ut(a) = θ>xt,a + βt ‖xt,a‖V−1

Receive feedback Ct ∈ 0, 1KCompute the stopping position Kt = mink : Ct(k) = 1 ∪ K andupdate

V ← V +

Kt∑k=1

xt,atkx>t,at

k, b ← b +

Kt∑k=1

xt,atkCt(k)

θ = V−1bShuai LI (CUHK) Learning to Rank 16 / 53

Page 36: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Results

We prove a regret bound

R(T ) = O

(d

p∗

√TK ln(T )

)

Experimental results —Ours —CombCascade

0 500 1000 1500 2000 2500 3000

Time t

0

50

100

150

Reg

ret

Synthetic Data

C3-UCB

CombCascade

0 500 1000 1500 2000

Time t

0

200

400

600

800

1000

1200

Reg

ret

Network 1221

C3-UCB

CombCascade

Shuai LI (CUHK) Learning to Rank 17 / 53

Page 37: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contextual Combinatorial Cascading Bandits[LWZC,ICML’2016] – Results

We prove a regret bound

R(T ) = O

(d

p∗

√TK ln(T )

)

Experimental results —Ours —CombCascade

0 500 1000 1500 2000 2500 3000

Time t

0

50

100

150

Reg

ret

Synthetic Data

C3-UCB

CombCascade

0 500 1000 1500 2000

Time t

0

200

400

600

800

1000

1200

Reg

ret

Network 1221

C3-UCB

CombCascade

Shuai LI (CUHK) Learning to Rank 17 / 53

Page 38: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Summary on Bandits with Click Models

Context Click Model Regret

[KSWA, 2015] - CM O( L∆ log(T ))

[LWZC, ICML’2016] Linear CM O( dp∗

√TK log(T ))

Shuai LI (CUHK) Learning to Rank 18 / 53

Page 39: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Outline

1 Motivation

2 Background

3 Problem Definition – Online

4 Click ModelsCascade Model (CM)

ICML’2016AAAI’2018IJCAI’2019

Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019

5 Offline Evaluations – KDD’2018

6 Conclusions

Shuai LI (CUHK) Learning to Rank 19 / 53

Page 40: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Clustering of Contextual Cascading Bandits [LZ,AAAI’2018]

Find clustering over users as well as recommending

The attractiveness function is generalized linear (GL)

Improve the regret results

Experiments —Ours · · ·C3-UCB

0M 1M 2M 3M 4M 5MTime t

0K

10K

20K

30K

40K

Cum. Regret

CLUB-cascade

C3-UCB/CascadeLinUCB

0M 1M 2M 3M 4M 5MTime t

0K

10K

20K

30K

40K

50K

Cum. Regret

CLUB-cascade

C3-UCB/CascadeLinUCB

Context Click Model Regret

[KSWA, 2015] - CM O( L∆ log(T ))

[LWZC, ICML’2016] Linear CM O( dp∗

√TK log(T ))

[LZ, AAAI’2018] GL CM O(d√TK log(T ))

Shuai LI (CUHK) Learning to Rank 20 / 53

Page 41: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Outline

1 Motivation

2 Background

3 Problem Definition – Online

4 Click ModelsCascade Model (CM)

ICML’2016AAAI’2018IJCAI’2019

Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019

5 Offline Evaluations – KDD’2018

6 Conclusions

Shuai LI (CUHK) Learning to Rank 21 / 53

Page 42: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Improved Algorithm on Clustering Bandits [LCLL,IJCAI’2019]

Arbitrary frequency distribution over users (compared to uniformdistribution)

Prove a regret bound that is free of the minimal frequency over users

R(T ) = O

(d√mT ln(T ) +

(1

γ2p

+nuγ2λ3

x

)ln(T )

)(compared to R(T ) = O

(d√mT ln(T ) + 1

pminγ2λ3x

ln(T ))

)

where nu is number of users and m is number of clusters

Experiments —Ours —CLUB —LinUCB-One —LinUCB-Ind

0 200k 400k 600k 800k 1mTime t

0

20k

40k

60k

Regr

et

Synthetic

0 200k 400k 600k 800k 1mTime t

0

20k

40k

60k

Regr

et

MovieLens

0 200k 400k 600k 800k 1mTime t

0

20k

40k

60k

Regr

et

Yelp

Shuai LI (CUHK) Learning to Rank 22 / 53

Page 43: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Improved Algorithm on Clustering Bandits [LCLL,IJCAI’2019]

Arbitrary frequency distribution over users (compared to uniformdistribution)

Prove a regret bound that is free of the minimal frequency over users

R(T ) = O

(d√mT ln(T ) +

(1

γ2p

+nuγ2λ3

x

)ln(T )

)(compared to R(T ) = O

(d√mT ln(T ) + 1

pminγ2λ3x

ln(T ))

)

where nu is number of users and m is number of clusters

Experiments —Ours —CLUB —LinUCB-One —LinUCB-Ind

0 200k 400k 600k 800k 1mTime t

0

20k

40k

60k

Regr

et

Synthetic

0 200k 400k 600k 800k 1mTime t

0

20k

40k

60k

Regr

et

MovieLens

0 200k 400k 600k 800k 1mTime t

0

20k

40k

60k

Regr

et

Yelp

Shuai LI (CUHK) Learning to Rank 22 / 53

Page 44: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contents

1 Motivation

2 Background

3 Problem Definition – Online

4 Click ModelsCascade Model (CM)

ICML’2016AAAI’2018IJCAI’2019

Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019

5 Offline Evaluations – KDD’2018

6 Conclusions

Shuai LI (CUHK) Learning to Rank 23 / 53

Page 45: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Dependent Click Model (DCM)

Allow multiple clicks

Assumes there is a probability ofsatisfaction after each click

r(A) = 1−∏K

k=1(1− α(ak)γk)

γk : satisfaction probability after clickon position k

The meaning of received feedback(0, 1, 0, 1, 0)

7no click

Xclick, not satisfied

7no click

Xclick, satisfied?

?

Context Click Model Regret

[KSWA, 2015] - CM O( L∆ log(T ))

[LWZC, ICML’2016] Linear CM O( dp∗

√TK log(T ))

[LZ, AAAI’2018] GL CM O(d√TK log(T ))

[KKSW, 2016] - DCM O( L∆ log(T ))

[LLZ, COCOON’2018] GL DCM O(dK√TK log(T ))

Shuai LI (CUHK) Learning to Rank 24 / 53

Page 46: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Dependent Click Model (DCM)

Allow multiple clicks

Assumes there is a probability ofsatisfaction after each click

r(A) = 1−∏K

k=1(1− α(ak)γk)

γk : satisfaction probability after clickon position k

The meaning of received feedback(0, 1, 0, 1, 0)

7no click

Xclick, not satisfied

7no click

Xclick, satisfied?

?

Context Click Model Regret

[KSWA, 2015] - CM O( L∆ log(T ))

[LWZC, ICML’2016] Linear CM O( dp∗

√TK log(T ))

[LZ, AAAI’2018] GL CM O(d√TK log(T ))

[KKSW, 2016] - DCM O( L∆ log(T ))

[LLZ, COCOON’2018] GL DCM O(dK√TK log(T ))

Shuai LI (CUHK) Learning to Rank 24 / 53

Page 47: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Dependent Click Model (DCM)

Allow multiple clicks

Assumes there is a probability ofsatisfaction after each click

r(A) = 1−∏K

k=1(1− α(ak)γk)

γk : satisfaction probability after clickon position k

The meaning of received feedback(0, 1, 0, 1, 0)

7no click

Xclick, not satisfied

7no click

Xclick, satisfied?

?

Context Click Model Regret

[KSWA, 2015] - CM O( L∆ log(T ))

[LWZC, ICML’2016] Linear CM O( dp∗

√TK log(T ))

[LZ, AAAI’2018] GL CM O(d√TK log(T ))

[KKSW, 2016] - DCM O( L∆ log(T ))

[LLZ, COCOON’2018] GL DCM O(dK√TK log(T ))

Shuai LI (CUHK) Learning to Rank 24 / 53

Page 48: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Dependent Click Model (DCM)

Allow multiple clicks

Assumes there is a probability ofsatisfaction after each click

r(A) = 1−∏K

k=1(1− α(ak)γk)

γk : satisfaction probability after clickon position k

The meaning of received feedback(0, 1, 0, 1, 0)

7no click

Xclick, not satisfied

7no click

Xclick, satisfied?

?

Context Click Model Regret

[KSWA, 2015] - CM O( L∆ log(T ))

[LWZC, ICML’2016] Linear CM O( dp∗

√TK log(T ))

[LZ, AAAI’2018] GL CM O(d√TK log(T ))

[KKSW, 2016] - DCM O( L∆ log(T ))

[LLZ, COCOON’2018] GL DCM O(dK√TK log(T ))

Shuai LI (CUHK) Learning to Rank 24 / 53

Page 49: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contents

1 Motivation

2 Background

3 Problem Definition – Online

4 Click ModelsCascade Model (CM)

ICML’2016AAAI’2018IJCAI’2019

Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019

5 Offline Evaluations – KDD’2018

6 Conclusions

Shuai LI (CUHK) Learning to Rank 25 / 53

Page 50: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Position-Based Model (PBM)

Most popular model in industry

Assumes the user click probability on an item a of position k can befactored into βk · α(a)

βk is position bias. Usually β1 ≥ β2 ≥ · · · ≥ βK

r(A) =∑K

k=1 βkα(ak)

The meaning of received feedback (0, 1, 0, 1, 0)

Shuai LI (CUHK) Learning to Rank 26 / 53

Page 51: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Position-Based Model (PBM)

Most popular model in industry

Assumes the user click probability on an item a of position k can befactored into βk · α(a)

βk is position bias. Usually β1 ≥ β2 ≥ · · · ≥ βK

r(A) =∑K

k=1 βkα(ak)

The meaning of received feedback (0, 1, 0, 1, 0)

Shuai LI (CUHK) Learning to Rank 26 / 53

Page 52: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Position-Based Model (PBM)

Most popular model in industry

Assumes the user click probability on an item a of position k can befactored into βk · α(a)

βk is position bias. Usually β1 ≥ β2 ≥ · · · ≥ βK

r(A) =∑K

k=1 βkα(ak)

The meaning of received feedback (0, 1, 0, 1, 0)

Shuai LI (CUHK) Learning to Rank 26 / 53

Page 53: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Position-Based Model (PBM)

Most popular model in industry

Assumes the user click probability on an item a of position k can befactored into βk · α(a)

βk is position bias. Usually β1 ≥ β2 ≥ · · · ≥ βK

r(A) =∑K

k=1 βkα(ak)

The meaning of received feedback (0, 1, 0, 1, 0)

Shuai LI (CUHK) Learning to Rank 26 / 53

Page 54: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Position-Based Model (PBM)

Most popular model in industry

Assumes the user click probability on an item a of position k can befactored into βk · α(a)

βk is position bias. Usually β1 ≥ β2 ≥ · · · ≥ βK

r(A) =∑K

k=1 βkα(ak)

The meaning of received feedback (0, 1, 0, 1, 0)

Shuai LI (CUHK) Learning to Rank 26 / 53

Page 55: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Summary on Bandits with Click Models

Context Click Model Regret

[KSWA, 2015] - CM O( L∆ log(T ))

[LWZC, ICML’2016] Linear CM O( dp∗

√TK log(T ))

[LZ, AAAI’2018] GL CM O(d√TK log(T ))

[KKSW, 2016] - DCM O( L∆ log(T ))

[LLZ, COCOON’2018] GL DCM O(dK√TK log(T ))

[LVC, 2016] - PBM with β O( L∆ log(T ))

Shuai LI (CUHK) Learning to Rank 27 / 53

Page 56: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Contents

1 Motivation

2 Background

3 Problem Definition – Online

4 Click ModelsCascade Model (CM)

ICML’2016AAAI’2018IJCAI’2019

Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019

5 Offline Evaluations – KDD’2018

6 Conclusions

Shuai LI (CUHK) Learning to Rank 28 / 53

Page 57: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

General Click Models

Common observations for click models

The click-through-rate (CTR) of list A on position k can be factoredinto

CTR(A, k) = χ(A, k) α(ak)

χ(A, k) is the examination probability of list A on position k

E.g. χ(A, k) =∏k−1

i=1 (1− α(ai )) in Cascade Model and χ(A, k) = βkin Position Based Model

Difficulties on General Click Models

χ depends on both click models and lists

Shuai LI (CUHK) Learning to Rank 29 / 53

Page 58: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

General Click Models

Common observations for click models

The click-through-rate (CTR) of list A on position k can be factoredinto

CTR(A, k) = χ(A, k) α(ak)

χ(A, k) is the examination probability of list A on position k

E.g. χ(A, k) =∏k−1

i=1 (1− α(ai )) in Cascade Model and χ(A, k) = βkin Position Based Model

Difficulties on General Click Models

χ depends on both click models and lists

Shuai LI (CUHK) Learning to Rank 29 / 53

Page 59: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

General Click Models

Common observations for click models

The click-through-rate (CTR) of list A on position k can be factoredinto

CTR(A, k) = χ(A, k) α(ak)

χ(A, k) is the examination probability of list A on position k

E.g. χ(A, k) =∏k−1

i=1 (1− α(ai )) in Cascade Model and χ(A, k) = βkin Position Based Model

Difficulties on General Click Models

χ depends on both click models and lists

Shuai LI (CUHK) Learning to Rank 29 / 53

Page 60: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Summary on Bandits with Click Models

Context Click Model Regret

[KSWA, 2015] - CM O( L∆ log(T ))

[LWZC, ICML’2016] Linear CM O( dp∗

√TK log(T ))

[LZ, AAAI’2018] GL CM O(d√TK log(T ))

[KKSW, 2016] - DCM O( L∆ log(T ))

[LLZ, COCOON’2018] GL DCM O(dK√TK log(T ))

[LVC, 2016] - PBM with β O( L∆ log(T ))

[ZTGKSW, 2017] - General O(K3L

∆ log(T ))

[LKLS, NIPS’2018] - General O(KL∆ log(T )

)O(√

K 3LT log(T ))

Ω(√

KLT)

Shuai LI (CUHK) Learning to Rank 30 / 53

Page 61: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Preparation

Recall

Each item a is represented by a feature vector xa ∈ Rd

The attractiveness of item a is α(a) = θ>xa

We bring up an algorithm called RecurRank (Recursive Ranking)

G-optimal design

Minimize the covariance of the least-squares estimatorX = x1, . . . , xn ⊂ Rd

For any distribution π : X → [0, 1], let Q(π) =∑

x∈X π(x)xx>

By the Kiefer–Wolfowitz theorem there exists a π called the G -optimaldesign such that

max det(Q(π)) or equivalently maxx∈X‖x‖2

Q(π)† ≤ d

John’s theorem implies that π may be chosen so that|x : π(x) > 0| ≤ d(d + 3)/2

Shuai LI (CUHK) Learning to Rank 31 / 53

Page 62: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Preparation

Recall

Each item a is represented by a feature vector xa ∈ Rd

The attractiveness of item a is α(a) = θ>xa

We bring up an algorithm called RecurRank (Recursive Ranking)

G-optimal design

Minimize the covariance of the least-squares estimatorX = x1, . . . , xn ⊂ Rd

For any distribution π : X → [0, 1], let Q(π) =∑

x∈X π(x)xx>

By the Kiefer–Wolfowitz theorem there exists a π called the G -optimaldesign such that

max det(Q(π)) or equivalently maxx∈X‖x‖2

Q(π)† ≤ d

John’s theorem implies that π may be chosen so that|x : π(x) > 0| ≤ d(d + 3)/2

Shuai LI (CUHK) Learning to Rank 31 / 53

Page 63: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Preparation

Recall

Each item a is represented by a feature vector xa ∈ Rd

The attractiveness of item a is α(a) = θ>xa

We bring up an algorithm called RecurRank (Recursive Ranking)

G-optimal design

Minimize the covariance of the least-squares estimator

X = x1, . . . , xn ⊂ Rd

For any distribution π : X → [0, 1], let Q(π) =∑

x∈X π(x)xx>

By the Kiefer–Wolfowitz theorem there exists a π called the G -optimaldesign such that

max det(Q(π)) or equivalently maxx∈X‖x‖2

Q(π)† ≤ d

John’s theorem implies that π may be chosen so that|x : π(x) > 0| ≤ d(d + 3)/2

Shuai LI (CUHK) Learning to Rank 31 / 53

Page 64: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Preparation

Recall

Each item a is represented by a feature vector xa ∈ Rd

The attractiveness of item a is α(a) = θ>xa

We bring up an algorithm called RecurRank (Recursive Ranking)

G-optimal design

Minimize the covariance of the least-squares estimatorX = x1, . . . , xn ⊂ Rd

For any distribution π : X → [0, 1], let Q(π) =∑

x∈X π(x)xx>

By the Kiefer–Wolfowitz theorem there exists a π called the G -optimaldesign such that

max det(Q(π)) or equivalently maxx∈X‖x‖2

Q(π)† ≤ d

John’s theorem implies that π may be chosen so that|x : π(x) > 0| ≤ d(d + 3)/2

Shuai LI (CUHK) Learning to Rank 31 / 53

Page 65: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Preparation

Recall

Each item a is represented by a feature vector xa ∈ Rd

The attractiveness of item a is α(a) = θ>xa

We bring up an algorithm called RecurRank (Recursive Ranking)

G-optimal design

Minimize the covariance of the least-squares estimatorX = x1, . . . , xn ⊂ Rd

For any distribution π : X → [0, 1], let Q(π) =∑

x∈X π(x)xx>

By the Kiefer–Wolfowitz theorem there exists a π called the G -optimaldesign such that

max det(Q(π)) or equivalently maxx∈X‖x‖2

Q(π)† ≤ d

John’s theorem implies that π may be chosen so that|x : π(x) > 0| ≤ d(d + 3)/2

Shuai LI (CUHK) Learning to Rank 31 / 53

Page 66: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Preparation

Recall

Each item a is represented by a feature vector xa ∈ Rd

The attractiveness of item a is α(a) = θ>xa

We bring up an algorithm called RecurRank (Recursive Ranking)

G-optimal design

Minimize the covariance of the least-squares estimatorX = x1, . . . , xn ⊂ Rd

For any distribution π : X → [0, 1], let Q(π) =∑

x∈X π(x)xx>

By the Kiefer–Wolfowitz theorem there exists a π called the G -optimaldesign such that

max det(Q(π)) or equivalently maxx∈X‖x‖2

Q(π)† ≤ d

John’s theorem implies that π may be chosen so that|x : π(x) > 0| ≤ d(d + 3)/2

Shuai LI (CUHK) Learning to Rank 31 / 53

Page 67: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Preparation

Recall

Each item a is represented by a feature vector xa ∈ Rd

The attractiveness of item a is α(a) = θ>xa

We bring up an algorithm called RecurRank (Recursive Ranking)

G-optimal design

Minimize the covariance of the least-squares estimatorX = x1, . . . , xn ⊂ Rd

For any distribution π : X → [0, 1], let Q(π) =∑

x∈X π(x)xx>

By the Kiefer–Wolfowitz theorem there exists a π called the G -optimaldesign such that

max det(Q(π)) or equivalently maxx∈X‖x‖2

Q(π)† ≤ d

John’s theorem implies that π may be chosen so that|x : π(x) > 0| ≤ d(d + 3)/2

Shuai LI (CUHK) Learning to Rank 31 / 53

Page 68: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm

RecurRank Algorithm

Each instantiation is called with three arguments:1 A phase number ` ∈ 1, 2, . . .;2 An ordered tuple of items A = (a1, a2, . . . , an);3 A tuple of positions K = (k, . . . , k + m − 1) and m ≤ n.

The algorithm is first called with ` = 1, a random order over all items1, . . . , L, and K = (1, . . . ,K )

Find a G -optimal design π = Gopt(A). Then compute

T (a) =

⌈d π(a)

2∆2`

log

(|A|δ`

)⌉, ∆` = 2−`

Hope to satisfy |α(a)− α(a)| ≤ ∆` for any a ∈ A by the end of thisinstantiationThis instantiation runs for

∑a∈A T (a) times

Shuai LI (CUHK) Learning to Rank 32 / 53

Page 69: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm

RecurRank AlgorithmEach instantiation is called with three arguments:

1 A phase number ` ∈ 1, 2, . . .;2 An ordered tuple of items A = (a1, a2, . . . , an);3 A tuple of positions K = (k, . . . , k + m − 1) and m ≤ n.

The algorithm is first called with ` = 1, a random order over all items1, . . . , L, and K = (1, . . . ,K )

Find a G -optimal design π = Gopt(A). Then compute

T (a) =

⌈d π(a)

2∆2`

log

(|A|δ`

)⌉, ∆` = 2−`

Hope to satisfy |α(a)− α(a)| ≤ ∆` for any a ∈ A by the end of thisinstantiationThis instantiation runs for

∑a∈A T (a) times

Shuai LI (CUHK) Learning to Rank 32 / 53

Page 70: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm

RecurRank AlgorithmEach instantiation is called with three arguments:

1 A phase number ` ∈ 1, 2, . . .;2 An ordered tuple of items A = (a1, a2, . . . , an);3 A tuple of positions K = (k, . . . , k + m − 1) and m ≤ n.

The algorithm is first called with ` = 1, a random order over all items1, . . . , L, and K = (1, . . . ,K )

Find a G -optimal design π = Gopt(A). Then compute

T (a) =

⌈d π(a)

2∆2`

log

(|A|δ`

)⌉, ∆` = 2−`

Hope to satisfy |α(a)− α(a)| ≤ ∆` for any a ∈ A by the end of thisinstantiationThis instantiation runs for

∑a∈A T (a) times

Shuai LI (CUHK) Learning to Rank 32 / 53

Page 71: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm

RecurRank AlgorithmEach instantiation is called with three arguments:

1 A phase number ` ∈ 1, 2, . . .;2 An ordered tuple of items A = (a1, a2, . . . , an);3 A tuple of positions K = (k, . . . , k + m − 1) and m ≤ n.

The algorithm is first called with ` = 1, a random order over all items1, . . . , L, and K = (1, . . . ,K )

Find a G -optimal design π = Gopt(A). Then compute

T (a) =

⌈d π(a)

2∆2`

log

(|A|δ`

)⌉, ∆` = 2−`

Hope to satisfy |α(a)− α(a)| ≤ ∆` for any a ∈ A by the end of thisinstantiationThis instantiation runs for

∑a∈A T (a) times

Shuai LI (CUHK) Learning to Rank 32 / 53

Page 72: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm

RecurRank AlgorithmEach instantiation is called with three arguments:

1 A phase number ` ∈ 1, 2, . . .;2 An ordered tuple of items A = (a1, a2, . . . , an);3 A tuple of positions K = (k, . . . , k + m − 1) and m ≤ n.

The algorithm is first called with ` = 1, a random order over all items1, . . . , L, and K = (1, . . . ,K )

Find a G -optimal design π = Gopt(A). Then compute

T (a) =

⌈d π(a)

2∆2`

log

(|A|δ`

)⌉, ∆` = 2−`

Hope to satisfy |α(a)− α(a)| ≤ ∆` for any a ∈ A by the end of thisinstantiation

This instantiation runs for∑

a∈A T (a) times

Shuai LI (CUHK) Learning to Rank 32 / 53

Page 73: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm

RecurRank AlgorithmEach instantiation is called with three arguments:

1 A phase number ` ∈ 1, 2, . . .;2 An ordered tuple of items A = (a1, a2, . . . , an);3 A tuple of positions K = (k, . . . , k + m − 1) and m ≤ n.

The algorithm is first called with ` = 1, a random order over all items1, . . . , L, and K = (1, . . . ,K )

Find a G -optimal design π = Gopt(A). Then compute

T (a) =

⌈d π(a)

2∆2`

log

(|A|δ`

)⌉, ∆` = 2−`

Hope to satisfy |α(a)− α(a)| ≤ ∆` for any a ∈ A by the end of thisinstantiationThis instantiation runs for

∑a∈A T (a) times

Shuai LI (CUHK) Learning to Rank 32 / 53

Page 74: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm (Continued)

RecurRank Algorithm (Continued)

Select each item a ∈ A exactly T (a) times at position k and put thefirst m − 1 items in A \ a at remaining positionsk + 1, . . . , k + m − 1first position — explorationremaining positions — exploitationonly first position has the same examination probability χ for all lists

E.g. Suppose we have computed T (a3) = 100, then it puts(a3, a1, a2, a4, . . . , am) on positions (k, . . . , k + m − 1) for 100 roundsCompute θ only using the feedbacks from first position k and rankitems in decreasing order of the estimated attractiveness

α(a1) ≥ α(a2) ≥ α(a3) ≥ · · · ≥ α(an)

Shuai LI (CUHK) Learning to Rank 33 / 53

Page 75: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm (Continued)

RecurRank Algorithm (Continued)

Select each item a ∈ A exactly T (a) times at position k and put thefirst m − 1 items in A \ a at remaining positionsk + 1, . . . , k + m − 1first position — explorationremaining positions — exploitationonly first position has the same examination probability χ for all listsE.g. Suppose we have computed T (a3) = 100, then it puts(a3, a1, a2, a4, . . . , am) on positions (k , . . . , k + m − 1) for 100 rounds

Compute θ only using the feedbacks from first position k and rankitems in decreasing order of the estimated attractiveness

α(a1) ≥ α(a2) ≥ α(a3) ≥ · · · ≥ α(an)

Shuai LI (CUHK) Learning to Rank 33 / 53

Page 76: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm (Continued)

RecurRank Algorithm (Continued)

Select each item a ∈ A exactly T (a) times at position k and put thefirst m − 1 items in A \ a at remaining positionsk + 1, . . . , k + m − 1first position — explorationremaining positions — exploitationonly first position has the same examination probability χ for all listsE.g. Suppose we have computed T (a3) = 100, then it puts(a3, a1, a2, a4, . . . , am) on positions (k , . . . , k + m − 1) for 100 roundsCompute θ only using the feedbacks from first position k and rankitems in decreasing order of the estimated attractiveness

α(a1) ≥ α(a2) ≥ α(a3) ≥ · · · ≥ α(an)

Shuai LI (CUHK) Learning to Rank 33 / 53

Page 77: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm (Continued)

RecurRank Algorithm (Continued)

Eliminate bad arms an′+1, . . . , an if

α(a1) ≥ · · · ≥ α(am) ≥ · · · ≥ α(an′) ≥ α(an′+1)︸ ︷︷ ︸gap ≥2∆`

≥ · · · ≥ α(an)

Split the partition for each consecutive gap larger than 2∆`

α(a1) ≥ · · · ≥ α(ak1 )

∣∣∣∣∣ α(ak1+1)︸ ︷︷ ︸gap ≥2∆`

≥ · · · ≥ α(ak2 )

∣∣∣∣∣ α(ak2+1)︸ ︷︷ ︸gap ≥2∆`

≥ · · · ≥ α(an′)

k, · · · , k + k1 − 1

∣∣∣∣∣ k + k1, · · · , k + k2 − 1

∣∣∣∣∣ k + k2, · · · , k + m − 1

Call the refined partitions with phase `+ 1

Shuai LI (CUHK) Learning to Rank 34 / 53

Page 78: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm (Continued)

RecurRank Algorithm (Continued)

Eliminate bad arms an′+1, . . . , an if

α(a1) ≥ · · · ≥ α(am) ≥ · · · ≥ α(an′) ≥ α(an′+1)︸ ︷︷ ︸gap ≥2∆`

≥ · · · ≥ α(an)

Split the partition for each consecutive gap larger than 2∆`

α(a1) ≥ · · · ≥ α(ak1 )

∣∣∣∣∣ α(ak1+1)︸ ︷︷ ︸gap ≥2∆`

≥ · · · ≥ α(ak2 )

∣∣∣∣∣ α(ak2+1)︸ ︷︷ ︸gap ≥2∆`

≥ · · · ≥ α(an′)

k , · · · , k + k1 − 1

∣∣∣∣∣ k + k1, · · · , k + k2 − 1

∣∣∣∣∣ k + k2, · · · , k + m − 1

Call the refined partitions with phase `+ 1

Shuai LI (CUHK) Learning to Rank 34 / 53

Page 79: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Algorithm (Continued)

RecurRank Algorithm (Continued)

Eliminate bad arms an′+1, . . . , an if

α(a1) ≥ · · · ≥ α(am) ≥ · · · ≥ α(an′) ≥ α(an′+1)︸ ︷︷ ︸gap ≥2∆`

≥ · · · ≥ α(an)

Split the partition for each consecutive gap larger than 2∆`

α(a1) ≥ · · · ≥ α(ak1 )

∣∣∣∣∣ α(ak1+1)︸ ︷︷ ︸gap ≥2∆`

≥ · · · ≥ α(ak2 )

∣∣∣∣∣ α(ak2+1)︸ ︷︷ ︸gap ≥2∆`

≥ · · · ≥ α(an′)

k , · · · , k + k1 − 1

∣∣∣∣∣ k + k1, · · · , k + k2 − 1

∣∣∣∣∣ k + k2, · · · , k + m − 1

Call the refined partitions with phase `+ 1

Shuai LI (CUHK) Learning to Rank 34 / 53

Page 80: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Results

Regret bound

R(T ) = O(K√

dT log(LT ))

Experiments —RecurRank(Ours) —C3-UCB —TopRank

0k 50k 100k 150k 200kTime t

10 2

10 1

100

101

102

103

Regr

et

(a) CM

0k 50k 100k 150k 200kTime t

0k

50k

100k

150k

200k

250k

300k

Regr

et

(b) PBM

Shuai LI (CUHK) Learning to Rank 35 / 53

Page 81: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Online Learning to Rank with Features [LLS, ICML’2019] –Results

Regret bound

R(T ) = O(K√

dT log(LT ))

Experiments —RecurRank(Ours) —C3-UCB —TopRank

0k 50k 100k 150k 200kTime t

10 2

10 1

100

101

102

103

Regr

et

(a) CM

0k 50k 100k 150k 200kTime t

0k

50k

100k

150k

200k

250k

300k

Regr

et

(b) PBM

Shuai LI (CUHK) Learning to Rank 35 / 53

Page 82: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Summary on Bandits with Click Models

Context Click Model Regret

[KSWA, 2015] - CM O( L∆ log(T ))

[LWZC, ICML’2016] Linear CM O( dp∗

√TK log(T ))

[LZ, AAAI’2018] GL CM O(d√TK log(T ))

[KKSW, 2016] - DCM O( L∆ log(T ))

[LLZ, COCOON’2018] GL DCM O(dK√TK log(T ))

[LVC, 2016] - PBM with β O( L∆ log(T ))

[ZTGKSW, 2017] - General O(K3L

∆ log(T ))

[LKLS, NIPS’2018] - General O(KL∆ log(T )

)O(√

K 3LT log(T ))

Ω(√

KLT)

[LLS, ICML’2019] Linear General O(K√dT log(LT ))

Shuai LI (CUHK) Learning to Rank 36 / 53

Page 83: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Outline

1 Motivation

2 Background

3 Problem Definition – Online

4 Click ModelsCascade Model (CM)

ICML’2016AAAI’2018IJCAI’2019

Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019

5 Offline Evaluations – KDD’2018

6 Conclusions

Shuai LI (CUHK) Learning to Rank 37 / 53

Page 84: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Offline Evaluations

Motivation

Can we estimate the expected number of clicks of new policies withoutdirectly employing it?

Offline Evaluation!

Objective:

To design statistically efficient estimators based on logged dataset forany ranking policy

Challenge:

The number of different lists is exponential in K

Shuai LI (CUHK) Learning to Rank 38 / 53

Page 85: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Offline Evaluations

Motivation

Can we estimate the expected number of clicks of new policies withoutdirectly employing it?

Offline Evaluation!

Objective:

To design statistically efficient estimators based on logged dataset forany ranking policy

Challenge:

The number of different lists is exponential in K

Shuai LI (CUHK) Learning to Rank 38 / 53

Page 86: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Offline Evaluations

Motivation

Can we estimate the expected number of clicks of new policies withoutdirectly employing it?

Offline Evaluation!

Objective:

To design statistically efficient estimators based on logged dataset forany ranking policy

Challenge:

The number of different lists is exponential in K

Shuai LI (CUHK) Learning to Rank 38 / 53

Page 87: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Offline Evaluations

Motivation

Can we estimate the expected number of clicks of new policies withoutdirectly employing it?

Offline Evaluation!

Objective:

To design statistically efficient estimators based on logged dataset forany ranking policy

Challenge:

The number of different lists is exponential in K

Shuai LI (CUHK) Learning to Rank 38 / 53

Page 88: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Offline Evaluation of Ranking Policies with Click Models[LAKMVW, KDD’2018]– Results

We design estimators for different click models

Item-Position, Random, Rank-Based, Position-Based, Document-Based

We prove that our estimators

are unbiased in a larger class of policieshave lower biasthe best policy have better theoretical guarantees

than the existing unstructured estimators under the correspondingclick model assumptions

Shuai LI (CUHK) Learning to Rank 39 / 53

Page 89: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Offline Evaluation of Ranking Policies with Click Models[LAKMVW, KDD’2018]– Results

We design estimators for different click models

Item-Position, Random, Rank-Based, Position-Based, Document-Based

We prove that our estimators

are unbiased in a larger class of policieshave lower biasthe best policy have better theoretical guarantees

than the existing unstructured estimators under the correspondingclick model assumptions

Shuai LI (CUHK) Learning to Rank 39 / 53

Page 90: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Offline Evaluation of Ranking Policies with Click Models[LAKMVW, KDD’2018] – Experiments

Experiments – 100 most frequent queries in Yandex dataset

100 101 102 103 104 105

M

0.04

0.06

0.08

0.10

0.12

RM

SE

(a) 100 Queries: K = 2

RCTR

Item

IP

PBM

List

100 101 102 103 104 105

M

0.04

0.06

0.08

0.10

0.12

RM

SE

(b) 100 Queries: K = 3

RCTR

Item

IP

PBM

List

100 101 102 103 104 105

M

0.04

0.06

0.08

0.10

0.12

0.14

0.16

RM

SE

100 Queries: K = 10

RCTR

Item

IP

PBM

List

Shuai LI (CUHK) Learning to Rank 40 / 53

Page 91: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Outline

1 Motivation

2 Background

3 Problem Definition – Online

4 Click ModelsCascade Model (CM)

ICML’2016AAAI’2018IJCAI’2019

Dependent Click Model – A co-authored workPosition-Based ModelGeneral Click Models – A co-authored work, ICML’2019

5 Offline Evaluations – KDD’2018

6 Conclusions

Shuai LI (CUHK) Learning to Rank 41 / 53

Page 92: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Conclusions

Context + Cascade model (CM) / Dependent click model (DCM)

Online clustering of bandits + Cascade model (CM)

Improved algorithm on clustering of bandits

Context + General click model

Offline evaluation of ranking policies with click models

Shuai LI (CUHK) Learning to Rank 42 / 53

Page 93: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Publications

First-author papers in thesis – in the order of thesis

1 Shuai Li, Baoxiang Wang, Shengyu Zhang, Wei Chen, ContextualCombinatorial Cascading Bandits, ICML, 2016

2 Shuai Li, Shengyu Zhang, Online Clustering of Contextual CascadingBandits, AAAI, 2018

3 Shuai Li, Wei Chen, S Li, Kwong-Sak Leung, Improved Algorithm onClustering of Bandits, IJCAI 2019

4 Shuai Li, Tor Lattimore, Csaba Szepesvari, Online Learning to Rankwith Features, ICML, 2019

5 Shuai Li, Yasin Abbasi-Yadkori, Branislav Kveton, S. Muthukrishnan,Vishwa Vinay and Zheng Wen, Offline Evaluation of Ranking Policieswith Click Models, KDD, 2018

Shuai LI (CUHK) Learning to Rank 43 / 53

Page 94: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Publications

Mentioned co-authored papers

6 Weiwen Liu, Shuai Li, Shengyu Zhang, Contextual Dependent ClickBandit Algorithm for Web Recommendation, COCOON, 2018

7 Tor Lattimore, Branislav Kveton, Shuai Li, Csaba Szepesvari,TopRank: A Practical Algorithm for Online Stochastic Ranking,NeurIPS, 2018

Other co-authored papers

8 Pengfei Liu, Hongjian Li, Shuai Li, Kwong-Sak Leung, ImprovingPrediction of Phenotypic Drug Response on Cancer Cell Lines UsingDeep Convolutional Network, BMC Bioinformatics, 2019

9 Ran Wang, Shuai Li, Man-Hon Wong, and Kwong-Sak Leung,Drug-Protein-Disease Association Prediction and Drug RepositioningBased on Tensor Decomposition, BIBM, 2018

10 Pengfei Liu, Shuai Li, Weiying Yi, Kwong-Sak Leung, A HybridDistributed Framework for SNP Selections, PDPTA, 2016

Shuai LI (CUHK) Learning to Rank 44 / 53

Page 95: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Publications

In submission

11 Shuai Li, Wei Chen, Zheng Wen, Kwong-Sak Leung, StochasticOnline Learning with Probabilistic Feedback Graph

12 Shuai Li, Kwong-Sak Leung, Generalized Clustering Bandits

13 Shuai Li, Tong Yu, Ole Mengshoel, Kwong-Sak Leung, OnlineSemi-Supervised Learning with Large Margin Separation

14 Xiaojin Zhang, Shuai Li, Shengyu Zhang, Contextual CombinatorialConservative Bandits

15 Pengfei Liu, Shuai Li, Kwong-Sak Leung, The Recovery of StochasticDifferential Equations with Genetic Programming andKullback-Leibler Divergence

Shuai LI (CUHK) Learning to Rank 45 / 53

Page 96: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Thank you!

&

Questions?

Shuai LI (CUHK) Learning to Rank 46 / 53

Page 97: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

References I

P. Auer, N. Cesa-Bianchi, and P. Fischer.Finite-time analysis of the multiarmed bandit problem.Machine learning, 47(2-3):235–256, 2002.

S. Katariya, B. Kveton, C. Szepesvari, and Z. Wen.Dcm bandits: Learning to rank with multiple clicks.In International Conference on Machine Learning, pages 1215–1224,2016.

B. Kveton, C. Szepesvari, Z. Wen, and A. Ashkan.Cascading bandits: Learning to rank in the cascade model.In International Conference on Machine Learning, pages 767–776,2015.

Shuai LI (CUHK) Learning to Rank 47 / 53

Page 98: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

References II

P. Lagree, C. Vernade, and O. Cappe.Multiple-play bandits in the position-based model.In Advances in Neural Information Processing Systems, pages1597–1605, 2016.

T. Lattimore, B. Kveton, Li, Shuai, and C. Szepesvari.Toprank: A practical algorithm for online stochastic ranking.In The Conference on Neural Information Processing Systems, 2018.

W. Liu, Li, Shuai, and S. Zhang.Contextual dependent click bandit algorithm for web recommendation.

In International Computing and Combinatorics Conference, pages39–50. Springer, 2018.

Shuai LI (CUHK) Learning to Rank 48 / 53

Page 99: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

References III

Li, Shuai, Y. Abbasi-Yadkori, B. Kveton, S. Muthukrishnan, V. Vinay,and Z. Wen.Offline evaluation of ranking policies with click models.In ACM SIGKDD Conference on Knowledge Discovery and DataMining, 2018.

Li, Shuai, W. Chen, S. Li, and K.-S. Leung.Improved algorithm on online clustering of bandits.In International Joint Conference on Artificial Intelligence (IJCAI),2019.

Li, Shuai, T. Lattimore, and C. Szepesvari.Online learning to rank with features.In International Conference on Machine Learning (ICML), 2019.

Shuai LI (CUHK) Learning to Rank 49 / 53

Page 100: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

References IV

Li, Shuai, B. Wang, S. Zhang, and W. Chen.Contextual combinatorial cascading bandits.In International Conference on Machine Learning, pages 1245–1253,2016.

Li, Shuai and S. Zhang.Online clustering of contextual cascading bandits.In The AAAI Conference on Artificial Intelligence, 2018.

M. Zoghi, T. Tunys, M. Ghavamzadeh, B. Kveton, C. Szepesvari, andZ. Wen.Online learning to rank in stochastic click models.In International Conference on Machine Learning, pages 4199–4208,2017.

Shuai LI (CUHK) Learning to Rank 50 / 53

Page 101: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

References V

S. Zong, H. Ni, K. Sung, N. R. Ke, Z. Wen, and B. Kveton.Cascading bandits for large-scale recommendation problems.In Proceedings of the Thirty-Second Conference on Uncertainty inArtificial Intelligence, pages 835–844. AUAI Press, 2016.

Shuai LI (CUHK) Learning to Rank 51 / 53

Page 102: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

A Key Part Proof for CLUB-cascade (Improving C3-UCB)

Et [R(At , y t)]

=Et

[(1−

K∏k=1

(1− y t(x∗t,k))

)−

(1−

K∏k=1

(1− y t(x t,k))

)]

=Et

[K∏

k=1

(1− y t(x t,k))−K∏

k=1

(1− y t(x∗t,k))

]

=Et

[K∑

k=1

(k−1∏`=1

(1− y t(x t,`))

)[(1− y t(x t,k))− (1− y t(x

∗t,k))

]( K∏`=k+1

(1− y t(x∗t,`))

)]

≤Et

[K∑

k=1

(k−1∏`=1

(1− y t(x t,`))

)[y t(x

∗t,k)− y t(x t,k)]

]

=Et

[K t∑k=1

[y t(x∗t,k)− y t(x t,k)]

]

Shuai LI (CUHK) Learning to Rank 52 / 53

Page 103: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Proof Sketch for RecurRank

Use (`, i) to represent the i-th call of RecurRank with `,A`i ,K`i

Prove with high probability for any (`, i)

a∗k ∈ A`i if k ∈ K`i|θ>`i xa − χ`iθ>∗ xa| ≤ ∆`, where χ`i is the examination probability of theoptimal list on the first position in K`i

In (`, i)th call, item a is put at position k, then

χ`i (α(a∗k)− α(a)) ≤ 8|K`i |∆` if k is the first position in K`iχ`i (α(a∗k)− α(a)) ≤ 4∆` if k is the remaining positionthus O(|K`i |∆`) regret for this part

Shuai LI (CUHK) Learning to Rank 53 / 53

Page 104: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Proof Sketch for RecurRank

Use (`, i) to represent the i-th call of RecurRank with `,A`i ,K`iProve with high probability for any (`, i)

a∗k ∈ A`i if k ∈ K`i|θ>`i xa − χ`iθ>∗ xa| ≤ ∆`, where χ`i is the examination probability of theoptimal list on the first position in K`i

In (`, i)th call, item a is put at position k, then

χ`i (α(a∗k)− α(a)) ≤ 8|K`i |∆` if k is the first position in K`iχ`i (α(a∗k)− α(a)) ≤ 4∆` if k is the remaining positionthus O(|K`i |∆`) regret for this part

Shuai LI (CUHK) Learning to Rank 53 / 53

Page 105: Learning to Rank with Click Models: From Online Algorithms ... · Learning to Rank with Click Models: From Online Algorithms to O ine Evaluations Shuai LI The Chinese University of

Proof Sketch for RecurRank

Use (`, i) to represent the i-th call of RecurRank with `,A`i ,K`iProve with high probability for any (`, i)

a∗k ∈ A`i if k ∈ K`i|θ>`i xa − χ`iθ>∗ xa| ≤ ∆`, where χ`i is the examination probability of theoptimal list on the first position in K`i

In (`, i)th call, item a is put at position k, then

χ`i (α(a∗k)− α(a)) ≤ 8|K`i |∆` if k is the first position in K`iχ`i (α(a∗k)− α(a)) ≤ 4∆` if k is the remaining positionthus O(|K`i |∆`) regret for this part

Shuai LI (CUHK) Learning to Rank 53 / 53


Recommended