Download - ICML 2014 CLUB - Online Clustering of Bandits Poster, 31st ICML, JMLR

Online Clustering of BanditsClaudio Gentile, Shuai Li: DiSTA, University of Insubria, Italy; Giovanni Zappella: Amazon Development Center Germany, Germany

[email protected]; [email protected]; [email protected], work done when the author was PhD student at Univeristy of Milan

Overview• Novel algorithmic approach to content rec-

ommendation based on adaptive clusteringof bandit strategies

• Relevant to group recommendation

• Relies on sequential clustering of users thatdeliberately avoids low-rank regularizations(scaling issues are our major concerns)

The CLUB Algorithm• n users, m << n clusters

• Users’ profiles ui, i = 1 . . . n

• Clusters’ profiles uj , j = 1 . . .m

• Nodes iwithin cluster j share same profile uj

• One linear bandit per node and one linearbandit per cluster: node i hosts proxy wi,cluster j hosts proxy zj

• zj is aggregation of proxies wi

• Nodes served sequentially in random order:node it gets xt,1, . . . ,xt,ct and selects one

u2

u1

w1

u3

w2

w3

w6

u3

w7

u2

z2

w4

w8

u3

u1

u3

u3u1

u3u2

w5

z1

• Start off from full n-node graph (or sparsifiedversion thereof) and single estimated cluster

• If ||wi −wj || > θ(i,j) =⇒ delete edge (i, j)

• Clusters are current connected components

• When serving user i in estimated cluster j,update node proxy wi and cluster proxy zj

• Recompute clusters after deleting edges

Two main issues

• Statistical: regret analysis

• Computational: running time and memory

The CLUB Algorithm: Solutions1. Start off from random (Erdos-Renyi) graph

• G is p-randomly sparsified with p ' log(n/δ)s

• All s-node subgraphs connected w.p. > 1− δ

• # of initial edges ' n2 p = n2

s log(n/δ) << n2

2a. Current clusters are union of underlying ones

u2

u1

w1

u3

w2

w3

w6

u3

w7

u2

z2

w4

w8

u3

u1

u3

u3u1

u3u2

w5

z1

• Within-cluster edges (w.r.t. the underlyingclustering) never deleted (w.h.p.)

• Between-cluster edges (w.r.t. the underly-ing clustering) eventually deleted (w.h.p), as-suming gap on different cluster profile vec-tors, and enough observed payoff values

2b. D.S. for incremental computation of clusters

• Decremental dynamic connectivity:Randomized construction maintaining span-ning forest.In our case: n >> d, |E| = npoly(log n)

d2 + dpoly(log n)(amortized) running time per round

3. Derived bound:m∑j=1

m∑`=1

||uj − u`||︸︷︷︸learning the clusters

+

(σ d+

√d) 1 +

m∑j=1

√|Vj |n

√T√m︸︷︷︸

learning cluster profile vectors

Experimental Results1. Synthetic datasets

• ct = 10, T = 55, 000, d = 25, and n = 500

• Cluster Vj is random unit-norm vector uj ∈ Rd

• Context vectors xt,k ∈ Rd generated uniformlywith unit-norm

• Cluster relative size |Vj | = n j−z∑m`=1 `

−z , j =

1, . . . ,m, with z ∈ {0, 1, 2, 3}

• Sequence of served users it generated uni-formly at random over the n users

• Payoff with each cluster = u>j xt,k plus white

noise0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rounds

Cum

. Reg

r. o

f Alg

. / C

um. R

egr.

of R

AN

Balanced Clusters −− No. of Clusters: 2 Payoff Noise: 0.1

CLUBLINUCB−INDLINUCB−ONEGOBLINCLAIRVOYANT

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rounds

Cum

. Reg

r. o

f Alg

. / C

um. R

egr.

of R

AN

Balanced Clusters −− No. of Clusters: 2 Payoff Noise: 0.3


0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rounds

Cum

. Reg

r. o

f Alg

. / C

um. R

egr.

of R

AN

Unbalanced Clusters −− No. of Clusters: 10 Payoff Noise: 0.1


0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

x 104

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rounds

Cum

. Reg

r. o

f Alg

. / C

um. R

egr.

of R

AN

Unbalanced Clusters −− No. of Clusters: 10 Payoff Noise: 0.3


2. LastFM & Delicious (“hits” & “niches”) datasets

• ct = 25, T = 55, 000, and d = 25

• LastFM contains 1,892 users, 17,632 artists

• Delicious contains 1,861 users, 69,226 URLs

• Payoff is 1 if the user listened or bookmarked0 1 2 3 4 5

x 104

0.75

0.8

0.85

0.9

0.95

1

Rounds

Cum

. Reg

r. o

f Alg

. / C

um. R

egr.

of R

AN

LastFM Dataset

CLUBLINUCB−INDLINUCB−ONE

0 1 2 3 4 5

x 104

0.75

0.8

0.85

0.9

0.95

1

Rounds

Cum

. Reg

r. o

f Alg

. / C

um. R

egr.

of R

AN

Delicious Dataset

CLUBLINUCB−INDLINUCB−ONE

3. Yahoo! (“ICML 2012 Challenge”) dataset

• ct = 41(med.), T = 55(75), 000, and d = 323

• 8, 362, 905 records, 713, 862 users, 323 news

• User described by 136D binary feature vector

• Payoff is 1 if the user clicked the news1 2 3 4 5

x 104

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Rounds

CT

R

Yahoo Dataset: 5K Users

CLUBUCB−INDUCB−ONEUCB−VRAN

1 2 3 4 5 6 7

x 104

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Rounds

CT

R

Yahoo Dataset: 18K Users

CLUBUCB−INDUCB−ONEUCB−VRAN

Conclusions• Algorithmic ideas and analyses for group rec.

• Generalizations:

– Overlapped clusters ?

– Soft clustering ?

– Shifting profiles (can handle this)

• Cold start: connect newcomer to all existingusers through directed edges (experimentsare ongoing)

• Get rid of i.i.d. assumption in the analysis ?

• Experiments underway with larger datasets

Short References[1] Cesa-Bianchi, N., Gentile, C., and Zappella, G., A gang of

bandits. NIPS 2013[2] Crammer, K. and Gentile, C., Multiclass classification with

bandit feedback using adaptive regularization. ICML 2011[3] Abbasi-Yadkori, Y., Pal, D., and Szepesvari, C., Improved

algorithms for linear stochastic bandits. NIPS 2011[4] Auer, P., Using confidence bounds for exploitation-

exploration trade-offs. 3:397-422, JMLR 2002[5] Azar, G., Lazaric, A., and Brunskill, E., Sequential transfer

in multi-armed bandit with finite set of models. NIPS 2013[6] Yue, Y., Hong, S. A., and Guestrin, C. Hierarchical explo-

ration for accelerating contextual bandits. ICML 2012[7] Chu, W., Li, L., Reyzin, L., and Schapire, R. E. Contextual

bandits with linear payoff functions. AISTATS 2011[8] Seldin, Y., Auer, P., Laviolette, F., Shawe.T., J., and Ortner,

R., Pac-bayesian analysis of contextual bandits. NIPS 2011[9] Maillard, O. and Mannor, S., Latent bandits. ICML 2014[10] Valko, M., Munos, R., Kveton, B., and Kocak, T., Spectral

Bandits for Smooth Graph Functions. ICML 2014