User Evaluation of the NASA Technical Report Server
Recommendation Service
Michael L. Nelson, Johan BollenOld Dominion University
{mln,jbollen}@cs.odu.edu
JoAnne R. Calhoun, Calvin E. MackeyNASA Langley Research Center{joanne.r.calhoun,calvin.e.mackey}@nasa.gov
Outline
• OAI-PMH• NASA Technical Report Server (NTRS)• Experimental Methodology• Results• Future Work & Conclusions
OAI-PMH Metadata Harvesting Model
data providers(repositories)
service providers(harvesters)
Aggregators
data providers(repositories)
service providers(harvesters)
aggregator
aggregators allow for:• scalability for OAI-PMH• load balancing • community building• discovery
NTRS• OAI-PMH aggregator
– OAI-PMH baseURL & humans: http://ntrs.nasa.gov/
• Technology– MySQL 4.0.12– Va Tech OAI-PMH harvester
• http://oai.dlib.vt.edu/odl/software/harvest/– Buckets 1.6.3
• Coverage– 837,000+ abstracts
• approaching a similar number of full-text– 17 different repositories
• 13 NASA, 4 non-NASA– in use since 1995
• since 2003 as an OAI-PMH service provider
NTRS Contents
NTRS Search Results
http://ntrs.nasa.gov/?method=display&redirect=http://techreports.larc.nasa.gov/ltrs/PDF/2000/aiaa/NASA-aiaa-2000-4886.pdf&oaiID=oai:ltrs.larc.nasa.gov:NASA-aiaa-2000-4886
Current Recommendation Generation Method
• Based on prior work @ LANL– intended for large-scale (10s M)
applications• Identify co-retrieval events from web
logs– If 2 articles are successively downloaded
within time t, increment the weight of co-retrieval
– t = 1 hour• Recommendations reflect a community’s
preferences; not an individual’s• The more a file is downloaded, the
stronger the recommendations for that file
– corollary: no downloads, no recommendations
U.S. Government & Web Privacy Policy
• Children's Online Privacy Protection Act of 1998 – http://www.ftc.gov/ogc/coppa1.htm
• Office of Management & Budget– http://www.whitehouse.gov/omb/memoranda/m99-18.html– http://www.whitehouse.gov/omb/memoranda/m00-13.html
• NASA– http://www.nasa.gov/about/highlights/HP_Privacy.html
• DOJ– http://www.usdoj.gov/04foia/privstat.htm
B) but does not include--(i) matches performed to produce aggregate statistical data without anypersonal identifiers;
Recommender Architecture
recommender.cs.odu.edu ntrs.nasa.gov
1. harvest log files from NTRS
2. compute recommendation matrix
3. NTRS requests recommendations for an OAI id
4. recommender responds with 10 OAI ids
5. NTRS does a local lookup on the ids & displays results
How Effective is the Log Analysis Method?
• Anecdotally, we knew that the recommendations were well received – but are they better than recommendations
generated by another method?• Goal: compare the perceived quality of
recommendations generated by:– log analysis (Method A)– vector space model (Method B)
Call for VolunteersTable 1. Profiles of the 13 Vo lunteers.
Highest level of education 11 MS, 1 somegraduate,1 BS
Years of professional experience Average=16, high=42,low=7
Average number of papers published over thelast 5 years
Average=1, high=3,low=0
Experience with NTRS (1=low, 5=high) Average=3.0Experience with WWW for research (1=low,5=high)
Average=3.84
Table 2. Organizational Affiliations of The Vo lunteers.# Competency / Branch
1 Atmospheric Sciences Competency/Radiation and AerosolsBranch
2 Aerospace Systems Concepts and Analysis Competency/VehicleAnalysis Branch
1 Aerospace Systems Concepts and AnalysisCompetency/Multidisciplinary Optimization Branch
1 Structures and Materials Competency/Nondestructive EvaluationSciences Branch
1 Office of Chief Information Offi cer/Library and Media ServiceBranch
2 Systems Engineering Competency/Data Analysis and ImagingBranch
4 Systems Engineering Competency/Flight Software SystemsBranch
1 Systems Engineering Competency/Test and Development Branch
• Announcements made in LaRC intranet, mailing lists, etc.
• Four 90 minute sessions held on base in a separate training facility
• Bribed with donuts & soft drinks
Methodology• Pick 10 papers from the LaRC collection that have
recommendations in the log analysis• Create VSM-based recommendations for all LaRC papers
– ~ 4100 LaRC papers• Instructions to volunteers
– for each of the 10 documents• read the abstract• score “good” evaluations generated by log analysis• score “good” evaluations generated by VSM
– search for their own papers, or papers they know well• scored evaluations for log analysis & VSM
Guidance for Judging Relevance
• Volunteers were encouraged to consider:– similarity: documents are obviously textually related– serendipity: documents are related in a way that you did
not anticipate– contrast: documents show competing / alternate
approaches, methodology, etc.– relation: documents by the same author, from the same
conference series, etc.
User Evaluation Session
Results• Result set
– 129 comparisons– 29 documents– 13 volunteers
• ANOVA– null hypothesis rejected at p<0.1; means
are marginally different• Spearman correlation coefficients
– a weak negative correlation between rater knowledge and log analysis ratings (-0.156)
– no correlation between rater knowledge & VSM (0.12931)
– positive, significant relationship between log analysis & VSM ratings (0.20100)
• some documents produce better relationships than others
Table 1. Results for Methods A & B.method A
(log)method B
(VSM)min 0 1
mean 2.28 6.90max 10 10std 2.28 2.35
Table 2. Mean ratings for method A (log analysis) and method B (VSM) for differentlevels of rater domain knowledge.
KnowledgeLevel
Method Amean
Method Bmean
Number ofRaters
A-B
k=1 (lowest) 2.8955 6.4776 67 3.5821k=2 3.1304 8.0435 23 4.9131k=3 3.5000 7.3750 16 3.8750k=4 2.0833 5.9167 12 3.8334
k=5 (highest) 1.4167 7.5833 12 6.1666
Rating DistributionA=log analysisB=VSM
We Did Not Find What We Hoped…
• Possible methodological shortcomings– we chose the documents randomly; we did not choose
the most “mature” documents from the collection• positive, significant correlation between number of downloads
and preference for the log analysis method (0.201)• negative, significant correlation between number of downloads
and preference for VSM method (-0.32)• positive significant correlation (0.384) between A/B and # of
downloads – paradox: the best qualified raters are the least likely to
show up…
Document / Volunteer Mismatch?Table 1. Subject Area of the 29 Documents
Subject code #Aeronautics 1Aerodynamics 2Air Transportation and Safety 1Avionics and Aircraft Instrumentation 1Aircraft Propulsion and Power 2Launch Vehicles and Launch Operations 3Space Transportation and Safety 1Space Communications, SpacecraftCommunications
1
Spacecraft Design, Testing andPerformance
3
Metals and Metallic Materials 1Fluid Mechanics and Thermodynamics 1Instrumentation and Photography 2Structural Mechanics 1Earth Resources and Remote Sensing 1Meteorology and Climatology 1Mathematical and Computer Sciences 1Computer Programming and Software 3Solid-State Physics 1Documentation and Information Science 2
Titles from the NASA STI Subject Categories, http://www.sti.nasa.gov/subjcat.pdf
Table 1. Organizational Affiliations of The Vo lunteers.# Competency / Branch
1 Atmospheric Sciences Competency/Radiation and AerosolsBranch
2 Aerospace Systems Concepts and Analysis Competency/VehicleAnalysis Branch
1 Aerospace Systems Concepts and AnalysisCompetency/Multidisciplinary Optimization Branch
1 Structures and Materials Competency/Nondestructive EvaluationSciences Branch
1 Office of Chief Information Offi cer/Library and Media ServiceBranch
2 Systems Engineering Competency/Data Analysis and ImagingBranch
4 Systems Engineering Competency/Flight Software SystemsBranch
1 Systems Engineering Competency/Test and Development Branch
Organization names ca. March 2004
1. raters not expert in the documents2. raters’ own publications outside of the NTRS core
Robots in the Mist?
robot noise?
Future Work• More frequent harvesting of logs for more up-to-date
recommendations– currently monthly granularity
• Minimize robot impact on the logs / recommendations• Seed log analysis recommendations with VSM results
– recommendations converge & mature more quickly• Re-run the experiment with:
– more mature documents– more subjects
• aerospace engineering graduate students?– pay them $$$
Conclusions• Slightly disappointing results
– VSM preferred over log analysis• …but, VSM had the deck stacked in its favor:
– significant mismatch between volunteer expertise & article subject– articles randomly chosen from the LaRC collection
• most mature articles not chosen; evidence that log analysis improves with download frequency
• Next steps:– scrub logs more to remove robots, other spurious data sources– mix VSM & log analysis– find a larger, more captive audience