Date post: | 15-Jan-2015 |
Category: |
Economy & Finance |
Upload: | oscar-celma |
View: | 2,984 times |
Download: | 5 times |
2nd netflix workshop // ACM KDD / Las Vegas, US // August, 24th 2008
From Hits to Niches?...or how popular artists can bias
music recommendation and discovery
Òscar Celma, Pedro Cano(Music Technology Group ~ UPF)
(Barcelona Music and Audio Technologies, BMAT)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
The problem
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
music overload
• Today(August, 2007)
iTunes: 6M tracks / 3B Sales P2P: 15B tracks 53% buy music on line
• Tomorrow All music will be on line Billions of tracks Millions more arriving every week
• Finding new, relevant music is hard!
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Can recommender systems help us?
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the Long Tail of popularity
• Help me find it!
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
If you like The Beatles you might like ...
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the musical Turing Test
• Which recommendation is from a human, which is from a machine?
List #2
• Chuck Berry
• Harry Nilsson
• XTC
• Marshall Crenshaw
• Super Furry Animals
• Badfinger
• The Raspberries
• The Flaming Lips
• Jason Faulkner
• Michael Penn
List #1
• Bob Dylan
• Beach Boys
• Billy Joel
• Rolling Stones
• Animals
• Aerosmith
• The Doors
• Simon & Garfunkel
• Crosby, Stills Nash & Young
• Paul Simon
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the musical Turing Test
• Which recommendation is from a human, which is from a machine?
List #2
• Chuck Berry
• Harry Nilsson
• XTC
• Marshall Crenshaw
• Super Furry Animals
• Badfinger
• The Raspberries
• The Flaming Lips
• Jason Faulkner
• Michael Penn
List #1
• Bob Dylan
• Beach Boys
• Billy Joel
• Rolling Stones
• Animals
• Aerosmith
• The Doors
• Simon & Garfunkel
• Crosby, Stills Nash & Young
• Paul Simon
Machine:
Up to 11
Human
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
• popularity bias
• low novelty
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Our proposal
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Long Tail popularity + item similarity network
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Long Tail popularity + item similarity network
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Item-to-item Complex Network Analysis
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• Datasets: Artist recommendation networks CF*: Social-based, incl. item-based CF (Last.fm)
“people who listen to X also listen to Y”
CB: Content-based Audio similarity “X and Y sound similar”
EX: Human expert-based (AllMusicGuide) “X similar to (or influenced by) Y”
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• Small-world networks
Network traverse in a few clicks
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• Last.fm is a scale-free network power law exponent for the cumulative indegree
distribution
A few artists (hubs) control the network
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• Indegree – avg. neighbor indegree correlation
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• Indegree – avg. neighbor indegree correlation
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• Indegree – avg. neighbor indegree correlation
Kin(Robert Palmer)=430=>avg(Kin(sim(Robert Palmer)))=342
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• Indegree – avg. neighbor indegree correlation
Kin(Robert Palmer)=430=>avg(Kin(sim(Robert Palmer)))=342
Kin(Ed Alton)=11=>avg(Kin(sim(Ed Alton)))=18
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
• Indegree – avg. neighbor indegree correlation
• Last.fm presents assortative mixing Artists with high indegree are connected together,
similarly for low indegree artists r = Pearson correlation
complex network analysis
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
|------------|---------|-----|-----------|
| | Last.fm | CB | Exp (AMG) |
|------------|---------|-----|-----------|
|Small World | yes | yes | yes |
| | | | |
| Scale-free | yes | No | No |
| | | | |
|Ass. mixing | yes | No | No |
|------------|---------|-----|-----------|
• Summary
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
complex network analysis
• But, still some remaining questions...
Are the hubs the most popular artists?
Can we navigate through the Long Tail, via artist similarity?
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Long Tail popularity + item similarity network
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Long Tail popularity + item similarity network
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Item popularity analysis
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the Long Tail in music
• last.fm dataset ~260K artists
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the Long Tail in music
• last.fm dataset ~260K artists
radiohead (40,762,895)
red hot chili peppers (37,564,100)
muse (30,548,064)
the beatles (50,422,827)
death cab for cutie (29,335,085)pink floyd (28,081,366)coldplay (27,120,352)
metallica (25,749,442)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the Long Tail model [Kilkki, 2007]
• F(x) = Cumulative distribution up to x
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the Long Tail model [Kilkki, 2007]
• Top-8 artists: F(8)~ 3.5% of total plays
50,422,827 the beatles40,762,895 radiohead37,564,100 red hot chili peppers30,548,064 muse29,335,085 death cab for cutie28,081,366 pink floyd27,120,352 coldplay25,749,442 metallica
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
the Long Tail model [Kilkki, 2007]
• Split the curve in three parts
(82 artists) (6,573 artists) (~254K artists)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Combining
complex network analysis
(item-item similarity)
with the Long Tail
(item popularity)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
Long Tail popularity + item similarity network
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist indegree vs. artist popularity
• Are the network hubs the most popular items?
???
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist indegree vs. artist popularity Last.fm: correlation between Kin and playcounts
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist indegree vs. artist popularity Audio CB similarity: no correlation
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist indegree vs. artist popularity Expert: correlation between Kin and playcounts
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• “From Hits to Niches” navigation in the Long Tail using artist similarity
how many clicks?
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• “From Hits to Niches” Audio CB similarity example (VIDEO)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• “From Hits to Niches” Audio CB similarity example
Bruce Springsteen (14,433,411 plays)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• “From Hits to Niches” Audio CB similarity example
Bruce Springsteen (14,433,411 plays) The Rolling Stones (27,720,169 plays)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• “From Hits to Niches” Audio CB similarity example
Bruce Springsteen (14,433,411 plays) The Rolling Stones (27,720,169 plays) Mike Shupp (577 plays)
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• navigation in the Long Tail Similar artists, given an artist in the HEAD part:
Also, it can be seen as a Markovian Stochastic process...
54,68%
(0%)
45,32%64,74%
28,80%6,46%
60,92%33,26%
5,82%
CF CB EXP
Head Mid Tail Head Mid Tail Head Mid Tail
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• navigation in the Long Tail Markov transition matrix
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• navigation in the Long Tail Markov transition matrix
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• navigation in the Long Tail Last.fm Markov transition matrix
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist similarity vs. artist popularity
• navigation in the Long Tail From Head to Tail, with P(T|H) > 0.4 Number of clicks needed
CF : 5 CB : 2 EXP: 2 HEAD
TAIL#clicks?
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
artist popularity
|-----------------------|---------|-----|-----------|
| | Last.fm | CB | Exp (AMG) |
|-----------------------|---------|-----|-----------|
| Indegree / popularity| yes | no | yes |
| | | | |
|Similarity / popularity| yes | no | no |
|-----------------------|---------|-----|-----------|
Summary
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
conclusions|-----------------------|---------|-----|-----------|
| | Last.fm | CB | Exp (AMG) |
|-----------------------|---------|-----|-----------|
| Small World | yes | yes | yes |
| | | | |
| Scale-free | yes | no | no |
| | | | |
| Ass. mixing | yes | no | no |
|-----------------------|---------|-----|-----------|
| Indegree / popularity| yes | no | yes |
| | | | |
|Similarity / popularity| yes | no | no |
|-----------------------|---------|-----|-----------|
| POPULARITY BIAS | YES | NO | FAIRLY |
|-----------------------|---------|-----|-----------|
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
future work
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
future work
• 3D Long Tail item popularity recommendation
network (item-to-item)
user profile
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
future work
• 3D Long Tail item popularity recommendation
network (item-to-item)
user profile
relevant items
ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano
future work
• 3D Long Tail item popularity recommendation
network (item-to-item)
user profile
relevant items
relevant and novel?
...paper @ ACM RecSys'08
2nd netflix workshop // ACM KDD / Las Vegas, US // August, 24th 2008
From Hits to Niches?...or how popular artists can bias music recommendation
and discovery
THANKS!!!
Òscar Celma, Pedro Cano(Music Technology Group ~ UPF)
(Barcelona Music and Audio Technologies, BMAT)