Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | mavis-riley |
View: | 215 times |
Download: | 0 times |
Distributed, Real-Time Computation of Community Preferences
Thomas Lutkenhouse, Michael L. Nelson, Johan Bollen
Old Dominion UniversityComputer Science Department
Norfolk, VA 23529 USA
{lutken,mln,jbollen}@cs.odu.edu
HT 2005 - Sixteenth ACM Conference on Hypertext and Hypermedia
6.-9.Sept. 2005, Salzburg Austria
Distributed, Real-Time
Computation of
Community Preferences
not CS if you don’t compute
changes are immediate
no central state
not personalization
Outline
• Review of technologies– buckets– Hebbian learning– previous results
• Experiment design• Results• Lessons learned• Conclusions
Non-evolution of DL Objects
. . .
RSS
SRW
!?
Buckets
• Premise: repositories come and go, but the objects should endure
• Began as part of NASA DL research– focus on digital preservation– implementation of the “Smart Objects, Dumb
Archives” (SODA) model for digital libraries• CACM 2001, doi.acm.org/10.1145/374308.374342• D-Lib, dx.doi.org/10.1045/february2001-nelson
Smart Objects• Responsibilities generally associated with the repository are
“pushed down” into the stored object– T&C, maintenance, logging, pagination & display, etc…
• Aggregate:– metadata– data– methods to operate on the metadata/data
• API examples• http://www.cs.odu.edu/~mln/teaching/cs595-f03/?
method=getMetadata&type=all• http://www.cs.odu.edu/~mln/teaching/cs595-f03/?method=listMethods• http://www.cs.odu.edu/~mln/teaching/cs595-f03/?method=listPreference• (cheat) http://www.cs.odu.edu/~mln/teaching/cs595-f03/bucket/bucket.xml
Examples
• 1.6.X bucket– http://ntrs.nasa.gov/– http://www.cs.odu.edu/~mln/phd/
• 2.0 buckets– http://www.cs.odu.edu/~mln/teaching/cs595-f03/– http://www.cs.odu.edu/~lutken/bucket/
• 3.0 buckets (under development)– http://beaufort.cs.odu.edu:8080/– uses MPEG-21 DIDLs
• cf. http://www.dlib.org/dlib/november03/bekaert/11bekaert.html
Hebbian Learning
Implementation issues: - gather log files
- problematic when spread across servers/domains
- determine a T for session reconstruction- typically 5 min
- compute links & weights - update the network periodically
- typically monthly
Previous, Log-Based Recommendation Implementations
• LANL Journal Recommendations– collection analysis based on journal readership patterns
• D-Lib Magazine, dx.doi.org/10.1045/june2002-bollen
• NASA Technical Report Server– compared recommendations with those generated by
VSM• WIDM 2004, doi.org.acm/1031453.1031480
• Open Video Project– generated recommendations for videos (little
descriptive metadata)• JCDL 2005, doi.acm.org/1065385.1065472
Hebbian Learning with Bucket Methods
http://a?method=display&referer=http://a&redirect=http://b?method=display%26referer=http://a
http://b?method=display&referer=http://b&redirect=http://a?method=display%26redirect=http://c?method=display%26referer=http://b
Experiment• Spin Magazine’s “Top 50 Rock Bands of All Time”
– something other than reports, journals, etc.– harvest allmusic.com for metadata for all LPs by the 50 bands
(total = 800 LPs)
• Maintain hierarchical arrangement– 1 artist N albums
• Initialize the network of 800 LPs with each LP randomly linked to 5 other LPs
• Send out email invitations to browse the network– have them explore, and then examine the resulting network– users not informed about the workings of the network
Display of LPs
Hierarchical, Weighted Links
weights - initial: 0.5 - frequency : 1.0 - symmetry: 0.5 - transitivity: 0.3
- <structural>- <element wt="0.5" id="~http://www.cs.odu.edu/~lutken/bucket/121/">- <metadata>- <descriptive> <title>Terrapin Station, Capital Centre, Landover, MD, 3/15/90</title> </descriptive>
<administrative /> </metadata>
</element>
- <element wt="0.5" id="~http://www.cs.odu.edu/~lutken/bucket/11/">- <metadata>- <descriptive> <title>Jealousy/Progress</title> </descriptive>
<administrative /> </metadata>
</element>
- <element wt="3" id="~http://www.cs.odu.edu/~lutken/bucket/434/">- <metadata>- <descriptive> <title>Nevermind</title> </descriptive>
<administrative /> </metadata>
</element>
- <element wt="0.5" id="~http://www.cs.odu.edu/~lutken/bucket/130/">- <metadata>- <descriptive> <title>Technical Ecstasy</title> </descriptive>
<administrative /> </metadata>
</element>…….
• August 2004 - October 2004• 160 respondents
– self-identify at the beginning; exit survey at the end
– 1200 bucket-to-bucket traversals (7.5 average traversals per session)
Respondents
Table 1. Profile of the 160 Volunteers
Nationality 1 Brazil, 1 Portugal, 4 Canada, 10 UK, 20 Belgium, 124 US
Sex 124 Male, 36 Female
Age High 72, Low 7, Average 37
Domain Knowledge Self-Assessment (1=low, 7=high)
Average = 4.0
Assessment of link utility(1=low, 5=high)
Average = 2.8
How to Evaluate the Resulting Network?
• Compute network analysis metrics:– PageRank– Degree Centrality– Weighted Degree Centrality
• Compare the results to:– Other “expert” lists (VH1, DigitalDreamDoor,
original Spin Magazine list)– Artist / LP best seller according to RIAA– Artist / LP Amazon sales rank
Expert Rankings
• No correlation with:– VH1 artist list– DigitalDreamDoor list– original Spin Magazine list (!)
(critics don’t agree with each other, or the record buying public)
RIAA Results
• RIAA had only– only 51/800 LPs– only 14/50 artists
(critics don’t buy records!)
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
All Bands Top 50% Top 20% Top 10%
Rank
Probability of being a bestseller
Degree Centrality
Weighted Degree Centrality
Page Rank
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
All albums Top 50% Top 20% Top 10% Top 5% Top 2% Top 1%
Rank
Probability of being a best seller
Degree Centrality
Weighted Degree Centrality
Page Rank
Figure 6. Probability of albums being best-sellers.
Figure 7. Probability of artists being best-sellers.
*RIAA sales caveat
Amazon Sales Rank
• No correlation with individual LP sales rank…
• …but correlated with mean artist sales rank– similar to RIAA data– interpretation: popular artists often have
obscure LPs
Relatedness(?)
Relatedness(?)
Lessons Learned
• While the subject matter was interesting, it was oriented for music geeks
• i.e., no actual music was delivered to the users (intellectual property considerations)
• more traversals needed
• Random initial starting points were difficult to overcome
• “cold start problem” - pre-seed the links according to some criteria?• weights did not decay over time/traversals
• Choosing only artists from Spin Magazine may have pre-filtered the response
• choose artists from Down Beat (Jazz), Vibe (Urban), Music City News (Country), etc.
Conclusions
• Can build a network of smart objects featuring adaptive, hierarchical links constructed in real-time without central state– network is created without latency and with
computations amortized over individual accesses
• Experimental testbed with popular music LP metadata shown to approach sales rank of artists, not LPs