IEEE-THEMES: Analysis and Exploitation of Musician Social Networks for Recommendation and Discovery

Post on 05-Dec-2014

1,649 views 1 download

description

Slides from IEEE-THEMES 2010 colocated with ICASSP. Paper to appear in August issue of Select Topics in Signal Processing. Abstract: This paper presents an extensive analysis of a sample of a social network of musicians. The network sample is first analyzed using standard complex network techniques to verify that it has similar properties to other web-derived complex networks. Content-based pairwise dissimilarity values between the musical data associated with the network sample are computed, and the relationship between those content-based distances and distances from network theory explored. Following this exploration, hybrid graphs and distance measures are constructed, and used to examine the community structure of the artist network. Finally, results of these investigations are presented and considered in the light of recommendation and discovery applications with these hybrid measures as their basis.

transcript

Analysis and Exploitation of Musician Social Networks

for Recommendation and Discovery

Kurt Jacobson

Ben Fields b.fields@gold.ac.uk

Christophe Rhodes

Mark Sandler

Michael Casey

Fields et. al - Analysis and Exploitation of Musician Social Networks2

overview– motivation– dataset– experiments– social radio

Fields et. al - Analysis and Exploitation of Musician Social Networks3

motivation

Fields et. al - Analysis and Exploitation of Musician Social Networks4

motivation Novelty Curves

Fields et. al - Analysis and Exploitation of Musician Social Networks5

motivation The Web

Fields et. al - Analysis and Exploitation of Musician Social Networks6

So much music,so little time.

Fields et. al - Analysis and Exploitation of Musician Social Networks7

So much music,so little of it good.

Fields et. al - Analysis and Exploitation of Musician Social Networks8

How do we discover good music?

Fields et. al - Analysis and Exploitation of Musician Social Networks9

listening

Fields et. al - Analysis and Exploitation of Musician Social Networks10

listeningsocial

Fields et. al - Analysis and Exploitation of Musician Social Networks11

listeningsocial

Fields et. al - Analysis and Exploitation of Musician Social Networks12

listeningsocial

Fields et. al - Analysis and Exploitation of Musician Social Networks13

listeningsocial

Fields et. al - Analysis and Exploitation of Musician Social Networks14

listeningsocial

Fields et. al - Analysis and Exploitation of Musician Social Networks15

dataset

Randomly Selected Artist

Fields et. al - Analysis and Exploitation of Musician Social Networks16

dataset Sampling Myspace

Randomly Selected Artist

Selected Artist's top

friend

Selected Artist's top

friend

Selected Artist's top

friend

Selected Artist's top

friend

Selected Artist's top

friend

Fields et. al - Analysis and Exploitation of Musician Social Networks17

dataset Sampling Myspace

Randomly Selected

Artist

Selected Artist's

top friend

Selected Artist's

top friend

Selected Artist's

top friend

Selected Artist's

top friend

Selected Artist's

top friend

Artist's top friend Artist's

top friend

Artist's top friend

Artist's top friend

Artist's top friend

Artist's top friend

Artist's top friend

Artist's top friend

Artist's top friend

Artist's top friend

Artist's top friend

Artist's top friend

Artist's top friend

Artist's top friend

Artist's top friend

Artist's top friend

Artist's top friend

Artist's top friend

Fields et. al - Analysis and Exploitation of Musician Social Networks18

dataset Sampling Myspace

Fields et. al - Analysis and Exploitation of Musician Social Networks19

dataset Sampling Myspace– scale-free (mostly)

– 15,478 nodes (artist pages)

– 120,487 directed edges

– 91,326 undirected edges

– avg. degree

– 15.5 as a directed graph

– 11.8 when undirected

Fields et. al - Analysis and Exploitation of Musician Social Networks20

dataset Cumulative Degree Distribution

Fields et. al - Analysis and Exploitation of Musician Social Networks21

dataset Cumulative Degree Distribution

Fields et. al - Analysis and Exploitation of Musician Social Networks22

experiments

Fields et. al - Analysis and Exploitation of Musician Social Networks23

experimentsGeodesic v. Acoustic Distance

–pair nodes by geodesic distance –looking for correlation with

pairwise EMD–result is inconclusive

Fields et. al - Analysis and Exploitation of Musician Social Networks24

experimentsGeodesic v. Acoustic Distance

Fields et. al - Analysis and Exploitation of Musician Social Networks25

experimentsMax Flow v. Acoustic Distance

– pairs of artist nodes grouped based on Maximum Flow

– a randomized network was created as well to compare the relationship

– results point toward a mostly orthogonal relationship

– examining the mutual information shows that most information not common across spaces

Fields et. al - Analysis and Exploitation of Musician Social Networks26

experimentsMax Flow v. Acoustic Distance

Fields et. al - Analysis and Exploitation of Musician Social Networks27

experimentsMax Flow v. EMD

Fields et. al - Analysis and Exploitation of Musician Social Networks28

experimentsMax Flow v. marsyas distance

Fields et. al - Analysis and Exploitation of Musician Social Networks29

experimentsLow Entropy Communities

–looking at whether communities are more homogenous if edges are weighted with sonic similarity

–uses genre entropy

Figure 1. Box and whisker plot showing the spread ofcommunity genre entropies for each graph partition methodwhere gm is greedy modularity, gm+a is greedy modular-ity with audio weights, wt is walktrap, and wt+a is walktrapwith audio weights. The horizontal line represents the genreentropy of the entire sample. The circles represent the av-erage value of genre entropy for a random partition of thenetwork into an equivalent number of communities.

If an artist specified no genre tags, this node is ignoredand makes no contribution to the genre entropy score. Inour data set, 2.6% of artists specified no genre tags.

4 RESULTS

The results of the various community detection algorithmsare summarized in Figure 1 and Table 1. When the genreentropies are averaged across all the detected communities,we see that for every community detection method the aver-age genre entropy is lower than SG as well as lower than theaverage genre entropy for a random partition of the graphinto an equal number of communities. This is strong evi-dence that the community structure of the network is relatedto musical genre.

It should be noted that even a very simple examinationof the genre distributions for the entire network sample sug-gests a network structure that is closely related to musicalgenre. Of all the genre associations collected for our dataset, 50.3% of the tags were either “Hip-Hop” or “Rap” while11.4% of tags were “R&B”. Smaller informal network sam-ples, independent of our main data set, were also dominatedby a handful of similar genre tags (i.e. “Alternative”, “In-die”, “Punk”). In context, this suggests our sample wasessentially “stuck” in a community of Myspace artists as-sociated with these particular genre inclinations. However,it is possible that these genre distributions are indicative ofthe entire Myspace artist network. Regardless, given that

algorithm c �SC� �Srand� Qnone 1 1.16 - -gm 42 0.81 1.13 0.61gm+a 33 0.90 1.13 0.64wt 195 0.80 1.08 0.61wt+a 271 0.70 1.06 0.62

Table 1. Results of the community detection algorithmswhere c is the number of communities detected, �SC� is theaverage genre entropy for all communities, �Srand� is theaverage genre entropy for a random partition of the networkinto an equal number of communities, and Q is the modu-larity for the given partition.

the genre entropy of our entire set is so low to begin withit is an encouraging result that we could efficiently identifycommunities of artists with even lower genre entropies.

From Figure 1 we see that, without audio-based weight-ing, the greedy modularity algorithm (gm) and the walk-trap algorithm (wt) result in nearly the same genre entropies.However the walktrap algorithm results in almost five timesas many communities which we would expect, because ofsmaller community size, to result in a lower genre entropy.It should also be noted that the optimized greedy modulationalgorithm is considerably faster than the walktrap algorithm- O(m log n) versus O(n2 log n).

With audio-based weighting, we see mixed results. Audio-based weighting seems to improve the results of the walk-trap algorithm (wt+a) - decreasing genre entropy and in-creasing modularity slightly. However, applying audio weightsto the greedy modularity algorithm (gm+a) actually increasedthe genre entropy scores and resulted in the identification offewer communities. It should be noted that our approach toaudio-based similarity was fairly primitive and alternativeapproaches may yield better results.

5 MYSPACE AND THE SEMANTIC WEB

Since our results indicate that the Myspace artist network isof interest in the context of music-related studies, we havemade an effort to convert this data to a more structured for-mat. We have created a Web service 5 that describes anyMyspace page in a machine-readable Semantic Web format.Using FOAF 6 and the Music Ontology 7 , the service de-scribes a Myspace page in XML RDF. This will allow fu-ture applications to easily make use of Myspace networkdata (i.e. for music recommendation).

5 available at (Omitted for submission)6 http://www.foaf-project.org/7 http://musicontology.com/

Fields et. al - Analysis and Exploitation of Musician Social Networks30

experimentsLow Entropy Communities

Figure 1. Box and whisker plot showing the spread ofcommunity genre entropies for each graph partition methodwhere gm is greedy modularity, gm+a is greedy modular-ity with audio weights, wt is walktrap, and wt+a is walktrapwith audio weights. The horizontal line represents the genreentropy of the entire sample. The circles represent the av-erage value of genre entropy for a random partition of thenetwork into an equivalent number of communities.

If an artist specified no genre tags, this node is ignoredand makes no contribution to the genre entropy score. Inour data set, 2.6% of artists specified no genre tags.

4 RESULTS

The results of the various community detection algorithmsare summarized in Figure 1 and Table 1. When the genreentropies are averaged across all the detected communities,we see that for every community detection method the aver-age genre entropy is lower than SG as well as lower than theaverage genre entropy for a random partition of the graphinto an equal number of communities. This is strong evi-dence that the community structure of the network is relatedto musical genre.

It should be noted that even a very simple examinationof the genre distributions for the entire network sample sug-gests a network structure that is closely related to musicalgenre. Of all the genre associations collected for our dataset, 50.3% of the tags were either “Hip-Hop” or “Rap” while11.4% of tags were “R&B”. Smaller informal network sam-ples, independent of our main data set, were also dominatedby a handful of similar genre tags (i.e. “Alternative”, “In-die”, “Punk”). In context, this suggests our sample wasessentially “stuck” in a community of Myspace artists as-sociated with these particular genre inclinations. However,it is possible that these genre distributions are indicative ofthe entire Myspace artist network. Regardless, given that

algorithm c �SC� �Srand� Qnone 1 1.16 - -gm 42 0.81 1.13 0.61gm+a 33 0.90 1.13 0.64wt 195 0.80 1.08 0.61wt+a 271 0.70 1.06 0.62

Table 1. Results of the community detection algorithmswhere c is the number of communities detected, �SC� is theaverage genre entropy for all communities, �Srand� is theaverage genre entropy for a random partition of the networkinto an equal number of communities, and Q is the modu-larity for the given partition.

the genre entropy of our entire set is so low to begin withit is an encouraging result that we could efficiently identifycommunities of artists with even lower genre entropies.

From Figure 1 we see that, without audio-based weight-ing, the greedy modularity algorithm (gm) and the walk-trap algorithm (wt) result in nearly the same genre entropies.However the walktrap algorithm results in almost five timesas many communities which we would expect, because ofsmaller community size, to result in a lower genre entropy.It should also be noted that the optimized greedy modulationalgorithm is considerably faster than the walktrap algorithm- O(m log n) versus O(n2 log n).

With audio-based weighting, we see mixed results. Audio-based weighting seems to improve the results of the walk-trap algorithm (wt+a) - decreasing genre entropy and in-creasing modularity slightly. However, applying audio weightsto the greedy modularity algorithm (gm+a) actually increasedthe genre entropy scores and resulted in the identification offewer communities. It should be noted that our approach toaudio-based similarity was fairly primitive and alternativeapproaches may yield better results.

5 MYSPACE AND THE SEMANTIC WEB

Since our results indicate that the Myspace artist network isof interest in the context of music-related studies, we havemade an effort to convert this data to a more structured for-mat. We have created a Web service 5 that describes anyMyspace page in a machine-readable Semantic Web format.Using FOAF 6 and the Music Ontology 7 , the service de-scribes a Myspace page in XML RDF. This will allow fu-ture applications to easily make use of Myspace networkdata (i.e. for music recommendation).

5 available at (Omitted for submission)6 http://www.foaf-project.org/7 http://musicontology.com/

Fields et. al - Analysis and Exploitation of Musician Social Networks31

experimentsLow Entropy Communities

Fields et. al - Analysis and Exploitation of Musician Social Networks32

social radio

Fields et. al - Analysis and Exploitation of Musician Social Networks33

social radioWeighted Max Flow Playlists

–max flow presents an interesting opportunity to create playlists using least resistant paths

–preliminary testing shows promise–needs more exhaustive testing

Fields et. al - Analysis and Exploitation of Musician Social Networks34

social radioPlaylist Generator

Fields et. al - Analysis and Exploitation of Musician Social Networks34

social radioPlaylist Generator

Fields et. al - Analysis and Exploitation of Musician Social Networks35

social radioThe Social Radio

– produce playlists via weighted distance paths

– next destination song is determined via a vote across all listeners

– candidate songs selected from disparate communities

Fields et. al - Analysis and Exploitation of Musician Social Networks36

social radioThe Social Radio

Fields et. al - Analysis and Exploitation of Musician Social Networks37

resources– http://mypyspace.sourceforge.net/

– http://dbtune.org/myspace/

– http://omras2.doc.gold.ac.uk/software/fftExtract/

– slides: http://slideshare.com/BenFields

– contact: b.fields@gold.ac.uk

http://blog.benfields.net

twitter: @alsothings

Fields et. al - Analysis and Exploitation of Musician Social Networks37

Questions?

resources– http://mypyspace.sourceforge.net/

– http://dbtune.org/myspace/

– http://omras2.doc.gold.ac.uk/software/fftExtract/

– slides: http://slideshare.com/BenFields

– contact: b.fields@gold.ac.uk

http://blog.benfields.net

twitter: @alsothings