From hits to niches? ...or how popular artists can bias music recommendation and discovery

Post on 15-Jan-2015

2,984 views 5 download

Tags:

description

This paper presents some experiments to analyse the popularity effect in music recommendation. Popularity is measured in terms of total playcounts, and the Long Tail model is used in order to rank music artists. Furthermore, metrics derived from complex network analysis are used to detect the influence of the most popular artists in the network of similar artists. The results from the experiments reveal that, as expected by its inherent social component, the collaborative filtering approach is prone to popularity bias. This has some consequences on the discovery ratio as well as in the navigation through the Long Tail. On the other hand, in both audio content-based and human expert-based approaches artists are linked independently of their popularity. This allows one to navigate from a mainstream artist to a Long Tail artist in just two or three clicks.

transcript

2nd netflix workshop // ACM KDD / Las Vegas, US // August, 24th 2008

From Hits to Niches?...or how popular artists can bias

music recommendation and discovery

Òscar Celma, Pedro Cano(Music Technology Group ~ UPF)

(Barcelona Music and Audio Technologies, BMAT)

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

The problem

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

music overload

• Today(August, 2007)

iTunes: 6M tracks / 3B Sales P2P: 15B tracks 53% buy music on line

• Tomorrow All music will be on line Billions of tracks Millions more arriving every week

• Finding new, relevant music is hard!

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

Can recommender systems help us?

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

the Long Tail of popularity

• Help me find it!

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

If you like The Beatles you might like ...

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

the musical Turing Test

• Which recommendation is from a human, which is from a machine?

List #2

• Chuck Berry

• Harry Nilsson

• XTC

• Marshall Crenshaw

• Super Furry Animals

• Badfinger

• The Raspberries

• The Flaming Lips

• Jason Faulkner

• Michael Penn

List #1

• Bob Dylan

• Beach Boys

• Billy Joel

• Rolling Stones

• Animals

• Aerosmith

• The Doors

• Simon & Garfunkel

• Crosby, Stills Nash & Young

• Paul Simon

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

the musical Turing Test

• Which recommendation is from a human, which is from a machine?

List #2

• Chuck Berry

• Harry Nilsson

• XTC

• Marshall Crenshaw

• Super Furry Animals

• Badfinger

• The Raspberries

• The Flaming Lips

• Jason Faulkner

• Michael Penn

List #1

• Bob Dylan

• Beach Boys

• Billy Joel

• Rolling Stones

• Animals

• Aerosmith

• The Doors

• Simon & Garfunkel

• Crosby, Stills Nash & Young

• Paul Simon

Machine: 

Up to 11

Human

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

• popularity bias

• low novelty

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

Our proposal

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

Long Tail popularity + item similarity network

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

Long Tail popularity + item similarity network

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

Item-to-item Complex Network Analysis

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

complex network analysis

• Datasets: Artist recommendation networks CF*: Social-based, incl. item-based CF (Last.fm)

“people who listen to X also listen to Y”

CB: Content-based Audio similarity “X and Y sound similar”

EX: Human expert-based (AllMusicGuide) “X similar to (or influenced by) Y”

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

complex network analysis

• Small-world networks

Network traverse in a few clicks

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

complex network analysis

• Last.fm is a scale-free network power law exponent for the cumulative indegree

distribution

A few artists (hubs) control the network

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

complex network analysis

• Indegree – avg. neighbor indegree correlation

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

complex network analysis

• Indegree – avg. neighbor indegree correlation

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

complex network analysis

• Indegree – avg. neighbor indegree correlation

Kin(Robert Palmer)=430=>avg(Kin(sim(Robert Palmer)))=342

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

complex network analysis

• Indegree – avg. neighbor indegree correlation

Kin(Robert Palmer)=430=>avg(Kin(sim(Robert Palmer)))=342

Kin(Ed Alton)=11=>avg(Kin(sim(Ed Alton)))=18

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

• Indegree – avg. neighbor indegree correlation

• Last.fm presents assortative mixing Artists with high indegree are connected together,

similarly for low indegree artists r = Pearson correlation

complex network analysis

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

complex network analysis

|------------|---------|-----|-----------|

| | Last.fm | CB | Exp (AMG) |

|------------|---------|-----|-----------|

|Small World | yes | yes | yes |

| | | | |

| Scale-free | yes | No | No |

| | | | |

|Ass. mixing | yes | No | No |

|------------|---------|-----|-----------|

• Summary

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

complex network analysis

• But, still some remaining questions...

Are the hubs the most popular artists?

Can we navigate through the Long Tail, via artist similarity?

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

Long Tail popularity + item similarity network

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

Long Tail popularity + item similarity network

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

Item popularity analysis

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

the Long Tail in music

• last.fm dataset ~260K artists

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

the Long Tail in music

• last.fm dataset ~260K artists

radiohead (40,762,895)

red hot chili peppers (37,564,100)

muse (30,548,064)

the beatles (50,422,827)

death cab for cutie (29,335,085)pink floyd (28,081,366)coldplay (27,120,352)

metallica (25,749,442)

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

the Long Tail model [Kilkki, 2007]

• F(x) = Cumulative distribution up to x

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

the Long Tail model [Kilkki, 2007]

• Top-8 artists: F(8)~ 3.5% of total plays

50,422,827   the beatles40,762,895   radiohead37,564,100   red hot chili peppers30,548,064   muse29,335,085   death cab for cutie28,081,366   pink floyd27,120,352   coldplay25,749,442   metallica

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

the Long Tail model [Kilkki, 2007]

• Split the curve in three parts

(82 artists) (6,573 artists) (~254K artists)

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

Combining

complex network analysis

(item-item similarity)

with the Long Tail

(item popularity)

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

Long Tail popularity + item similarity network

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist indegree vs. artist popularity

• Are the network hubs the most popular items?

???

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist indegree vs. artist popularity Last.fm: correlation between Kin and playcounts

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist indegree vs. artist popularity Audio CB similarity: no correlation

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist indegree vs. artist popularity Expert: correlation between Kin and playcounts

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist similarity vs. artist popularity

• “From Hits to Niches” navigation in the Long Tail using artist similarity

how many clicks?

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist similarity vs. artist popularity

• “From Hits to Niches” Audio CB similarity example (VIDEO)

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist similarity vs. artist popularity

• “From Hits to Niches” Audio CB similarity example

Bruce Springsteen (14,433,411 plays)

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist similarity vs. artist popularity

• “From Hits to Niches” Audio CB similarity example

Bruce Springsteen (14,433,411 plays) The Rolling Stones (27,720,169 plays)

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist similarity vs. artist popularity

• “From Hits to Niches” Audio CB similarity example

Bruce Springsteen (14,433,411 plays) The Rolling Stones (27,720,169 plays) Mike Shupp (577 plays)

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist similarity vs. artist popularity

• navigation in the Long Tail Similar artists, given an artist in the HEAD part:

Also, it can be seen as a Markovian Stochastic process...

54,68%

(0%)

45,32%64,74%

28,80%6,46%

60,92%33,26%

5,82%

CF CB EXP

Head Mid Tail Head Mid Tail Head Mid Tail

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist similarity vs. artist popularity

• navigation in the Long Tail Markov transition matrix

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist similarity vs. artist popularity

• navigation in the Long Tail Markov transition matrix

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist similarity vs. artist popularity

• navigation in the Long Tail Last.fm Markov transition matrix

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist similarity vs. artist popularity

• navigation in the Long Tail From Head to Tail, with P(T|H) > 0.4 Number of clicks needed

CF : 5 CB : 2 EXP: 2 HEAD

TAIL#clicks?

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

artist popularity

|-----------------------|---------|-----|-----------|

| | Last.fm | CB | Exp (AMG) |

|-----------------------|---------|-----|-----------|

| Indegree / popularity| yes | no | yes |

| | | | |

|Similarity / popularity| yes | no | no |

|-----------------------|---------|-----|-----------|

Summary

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

conclusions|-----------------------|---------|-----|-----------|

| | Last.fm | CB | Exp (AMG) |

|-----------------------|---------|-----|-----------|

| Small World | yes | yes | yes |

| | | | |

| Scale-free | yes | no | no |

| | | | |

| Ass. mixing | yes | no | no |

|-----------------------|---------|-----|-----------|

| Indegree / popularity| yes | no | yes |

| | | | |

|Similarity / popularity| yes | no | no |

|-----------------------|---------|-----|-----------|

| POPULARITY BIAS | YES | NO | FAIRLY |

|-----------------------|---------|-----|-----------|

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

future work

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

future work

• 3D Long Tail item popularity recommendation

network (item-to-item)

user profile

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

future work

• 3D Long Tail item popularity recommendation

network (item-to-item)

user profile

relevant items

ACM KDD // 2nd netflix workshop // 2008 // òscar celma & pedro cano

future work

• 3D Long Tail item popularity recommendation

network (item-to-item)

user profile

relevant items

relevant and novel?

...paper @ ACM RecSys'08

2nd netflix workshop // ACM KDD / Las Vegas, US // August, 24th 2008

From Hits to Niches?...or how popular artists can bias music recommendation

and discovery

THANKS!!!

Òscar Celma, Pedro Cano(Music Technology Group ~ UPF)

(Barcelona Music and Audio Technologies, BMAT)