Online Clustering of BanditsClaudio Gentile, Shuai Li: DiSTA, University of Insubria, Italy; Giovanni Zappella: Amazon Development Center Germany, Germany
[email protected]; [email protected]; [email protected], work done when the author was PhD student at Univeristy of Milan
Overview• Novel algorithmic approach to content rec-
ommendation based on adaptive clusteringof bandit strategies
• Relevant to group recommendation
• Relies on sequential clustering of users thatdeliberately avoids low-rank regularizations(scaling issues are our major concerns)
The CLUB Algorithm• n users, m << n clusters
• Users’ profiles ui, i = 1 . . . n
• Clusters’ profiles uj , j = 1 . . .m
• Nodes iwithin cluster j share same profile uj
• One linear bandit per node and one linearbandit per cluster: node i hosts proxy wi,cluster j hosts proxy zj
• zj is aggregation of proxies wi
• Nodes served sequentially in random order:node it gets xt,1, . . . ,xt,ct and selects one
u2
u1
w1
u3
w2
w3
w6
u3
w7
u2
z2
w4
w8
u3
u1
u3
u3u1
u3u2
w5
z1
• Start off from full n-node graph (or sparsifiedversion thereof) and single estimated cluster
• If ||wi −wj || > θ(i,j) =⇒ delete edge (i, j)
• Clusters are current connected components
• When serving user i in estimated cluster j,update node proxy wi and cluster proxy zj
• Recompute clusters after deleting edges
Two main issues
• Statistical: regret analysis
• Computational: running time and memory
The CLUB Algorithm: Solutions1. Start off from random (Erdos-Renyi) graph
• G is p-randomly sparsified with p ' log(n/δ)s
• All s-node subgraphs connected w.p. > 1− δ
• # of initial edges ' n2 p = n2
s log(n/δ) << n2
2a. Current clusters are union of underlying ones
u2
u1
w1
u3
w2
w3
w6
u3
w7
u2
z2
w4
w8
u3
u1
u3
u3u1
u3u2
w5
z1
• Within-cluster edges (w.r.t. the underlyingclustering) never deleted (w.h.p.)
• Between-cluster edges (w.r.t. the underly-ing clustering) eventually deleted (w.h.p), as-suming gap on different cluster profile vec-tors, and enough observed payoff values
2b. D.S. for incremental computation of clusters
• Decremental dynamic connectivity:Randomized construction maintaining span-ning forest.In our case: n >> d, |E| = npoly(log n)
d2 + dpoly(log n)(amortized) running time per round
3. Derived bound:m∑j=1
m∑`=1
||uj − u`||︸ ︷︷ ︸learning the clusters
+
(σ d+
√d) 1 +
m∑j=1
√|Vj |n
√T√m︸ ︷︷ ︸
learning cluster profile vectors
Experimental Results1. Synthetic datasets
• ct = 10, T = 55, 000, d = 25, and n = 500
• Cluster Vj is random unit-norm vector uj ∈ Rd
• Context vectors xt,k ∈ Rd generated uniformlywith unit-norm
• Cluster relative size |Vj | = n j−z∑m`=1 `
−z , j =
1, . . . ,m, with z ∈ {0, 1, 2, 3}
• Sequence of served users it generated uni-formly at random over the n users
• Payoff with each cluster = u>j xt,k plus white
noise0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 104
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Rounds
Cum
. Reg
r. o
f Alg
. / C
um. R
egr.
of R
AN
Balanced Clusters −− No. of Clusters: 2 Payoff Noise: 0.1
CLUBLINUCB−INDLINUCB−ONEGOBLINCLAIRVOYANT
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 104
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Rounds
Cum
. Reg
r. o
f Alg
. / C
um. R
egr.
of R
AN
Balanced Clusters −− No. of Clusters: 2 Payoff Noise: 0.3
CLUBLINUCB−INDLINUCB−ONEGOBLINCLAIRVOYANT
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 104
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Rounds
Cum
. Reg
r. o
f Alg
. / C
um. R
egr.
of R
AN
Unbalanced Clusters −− No. of Clusters: 10 Payoff Noise: 0.1
CLUBLINUCB−INDLINUCB−ONEGOBLINCLAIRVOYANT
0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x 104
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Rounds
Cum
. Reg
r. o
f Alg
. / C
um. R
egr.
of R
AN
Unbalanced Clusters −− No. of Clusters: 10 Payoff Noise: 0.3
CLUBLINUCB−INDLINUCB−ONEGOBLINCLAIRVOYANT
2. LastFM & Delicious (“hits” & “niches”) datasets
• ct = 25, T = 55, 000, and d = 25
• LastFM contains 1,892 users, 17,632 artists
• Delicious contains 1,861 users, 69,226 URLs
• Payoff is 1 if the user listened or bookmarked0 1 2 3 4 5
x 104
0.75
0.8
0.85
0.9
0.95
1
Rounds
Cum
. Reg
r. o
f Alg
. / C
um. R
egr.
of R
AN
LastFM Dataset
CLUBLINUCB−INDLINUCB−ONE
0 1 2 3 4 5
x 104
0.75
0.8
0.85
0.9
0.95
1
Rounds
Cum
. Reg
r. o
f Alg
. / C
um. R
egr.
of R
AN
Delicious Dataset
CLUBLINUCB−INDLINUCB−ONE
3. Yahoo! (“ICML 2012 Challenge”) dataset
• ct = 41(med.), T = 55(75), 000, and d = 323
• 8, 362, 905 records, 713, 862 users, 323 news
• User described by 136D binary feature vector
• Payoff is 1 if the user clicked the news1 2 3 4 5
x 104
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Rounds
CT
R
Yahoo Dataset: 5K Users
CLUBUCB−INDUCB−ONEUCB−VRAN
1 2 3 4 5 6 7
x 104
0.01
0.02
0.03
0.04
0.05
0.06
0.07
Rounds
CT
R
Yahoo Dataset: 18K Users
CLUBUCB−INDUCB−ONEUCB−VRAN
Conclusions• Algorithmic ideas and analyses for group rec.
• Generalizations:
– Overlapped clusters ?
– Soft clustering ?
– Shifting profiles (can handle this)
• Cold start: connect newcomer to all existingusers through directed edges (experimentsare ongoing)
• Get rid of i.i.d. assumption in the analysis ?
• Experiments underway with larger datasets
Short References[1] Cesa-Bianchi, N., Gentile, C., and Zappella, G., A gang of
bandits. NIPS 2013[2] Crammer, K. and Gentile, C., Multiclass classification with
bandit feedback using adaptive regularization. ICML 2011[3] Abbasi-Yadkori, Y., Pal, D., and Szepesvari, C., Improved
algorithms for linear stochastic bandits. NIPS 2011[4] Auer, P., Using confidence bounds for exploitation-
exploration trade-offs. 3:397-422, JMLR 2002[5] Azar, G., Lazaric, A., and Brunskill, E., Sequential transfer
in multi-armed bandit with finite set of models. NIPS 2013[6] Yue, Y., Hong, S. A., and Guestrin, C. Hierarchical explo-
ration for accelerating contextual bandits. ICML 2012[7] Chu, W., Li, L., Reyzin, L., and Schapire, R. E. Contextual
bandits with linear payoff functions. AISTATS 2011[8] Seldin, Y., Auer, P., Laviolette, F., Shawe.T., J., and Ortner,
R., Pac-bayesian analysis of contextual bandits. NIPS 2011[9] Maillard, O. and Mannor, S., Latent bandits. ICML 2014[10] Valko, M., Munos, R., Kveton, B., and Kocak, T., Spectral
Bandits for Smooth Graph Functions. ICML 2014