User Evaluation of the NASA Technical Report Server Recommendation Service

User Evaluation of the NASA Technical Report Server

Recommendation Service

Michael L. Nelson, Johan BollenOld Dominion University

{mln,jbollen}@cs.odu.edu

JoAnne R. Calhoun, Calvin E. MackeyNASA Langley Research Center{joanne.r.calhoun,calvin.e.mackey}@nasa.gov

Outline

• OAI-PMH• NASA Technical Report Server (NTRS)• Experimental Methodology• Results• Future Work & Conclusions

OAI-PMH Metadata Harvesting Model

data providers(repositories)

service providers(harvesters)

Aggregators

data providers(repositories)

service providers(harvesters)

aggregator

aggregators allow for:• scalability for OAI-PMH• load balancing • community building• discovery

NTRS• OAI-PMH aggregator

– OAI-PMH baseURL & humans: http://ntrs.nasa.gov/

• Technology– MySQL 4.0.12– Va Tech OAI-PMH harvester

• http://oai.dlib.vt.edu/odl/software/harvest/– Buckets 1.6.3

• Coverage– 837,000+ abstracts

• approaching a similar number of full-text– 17 different repositories

• 13 NASA, 4 non-NASA– in use since 1995

• since 2003 as an OAI-PMH service provider

http://ntrs.nasa.gov/

http://oai.dlib.vt.edu/odl/software/harvest/

NTRS Contents

NTRS Search Results

http://ntrs.nasa.gov/?method=display&redirect=http://techreports.larc.nasa.gov/ltrs/PDF/2000/aiaa/NASA-aiaa-2000-4886.pdf&oaiID=oai:ltrs.larc.nasa.gov:NASA-aiaa-2000-4886

Current Recommendation Generation Method

• Based on prior work @ LANL– intended for large-scale (10s M)

applications• Identify co-retrieval events from web

logs– If 2 articles are successively downloaded

within time t, increment the weight of co-retrieval

– t = 1 hour• Recommendations reflect a community’s

preferences; not an individual’s• The more a file is downloaded, the

stronger the recommendations for that file

– corollary: no downloads, no recommendations

U.S. Government & Web Privacy Policy

• Children's Online Privacy Protection Act of 1998 – http://www.ftc.gov/ogc/coppa1.htm

• Office of Management & Budget– http://www.whitehouse.gov/omb/memoranda/m99-18.html– http://www.whitehouse.gov/omb/memoranda/m00-13.html

• NASA– http://www.nasa.gov/about/highlights/HP_Privacy.html

• DOJ– http://www.usdoj.gov/04foia/privstat.htm

B) but does not include--(i) matches performed to produce aggregate statistical data without anypersonal identifiers;

Recommender Architecture

recommender.cs.odu.edu ntrs.nasa.gov

1. harvest log files from NTRS

2. compute recommendation matrix

3. NTRS requests recommendations for an OAI id

4. recommender responds with 10 OAI ids

5. NTRS does a local lookup on the ids & displays results

How Effective is the Log Analysis Method?

• Anecdotally, we knew that the recommendations were well received – but are they better than recommendations

generated by another method?• Goal: compare the perceived quality of

recommendations generated by:– log analysis (Method A)– vector space model (Method B)

Call for VolunteersTable 1. Profiles of the 13 Vo lunteers.

Highest level of education 11 MS, 1 somegraduate,1 BS

Years of professional experience Average=16, high=42,low=7

Average number of papers published over thelast 5 years

Average=1, high=3,low=0

Experience with NTRS (1=low, 5=high) Average=3.0Experience with WWW for research (1=low,5=high)

Average=3.84

Table 2. Organizational Affiliations of The Vo lunteers.# Competency / Branch

1 Atmospheric Sciences Competency/Radiation and AerosolsBranch

2 Aerospace Systems Concepts and Analysis Competency/VehicleAnalysis Branch

1 Aerospace Systems Concepts and AnalysisCompetency/Multidisciplinary Optimization Branch

1 Structures and Materials Competency/Nondestructive EvaluationSciences Branch

1 Office of Chief Information Offi cer/Library and Media ServiceBranch

2 Systems Engineering Competency/Data Analysis and ImagingBranch

4 Systems Engineering Competency/Flight Software SystemsBranch

1 Systems Engineering Competency/Test and Development Branch

• Announcements made in LaRC intranet, mailing lists, etc.

• Four 90 minute sessions held on base in a separate training facility

• Bribed with donuts & soft drinks

Methodology• Pick 10 papers from the LaRC collection that have

recommendations in the log analysis• Create VSM-based recommendations for all LaRC papers

– ~ 4100 LaRC papers• Instructions to volunteers

– for each of the 10 documents• read the abstract• score “good” evaluations generated by log analysis• score “good” evaluations generated by VSM

– search for their own papers, or papers they know well• scored evaluations for log analysis & VSM

Guidance for Judging Relevance

• Volunteers were encouraged to consider:– similarity: documents are obviously textually related– serendipity: documents are related in a way that you did

not anticipate– contrast: documents show competing / alternate

approaches, methodology, etc.– relation: documents by the same author, from the same

conference series, etc.

User Evaluation Session

Results• Result set

– 129 comparisons– 29 documents– 13 volunteers

• ANOVA– null hypothesis rejected at p<0.1; means

are marginally different• Spearman correlation coefficients

– a weak negative correlation between rater knowledge and log analysis ratings (-0.156)

– no correlation between rater knowledge & VSM (0.12931)

– positive, significant relationship between log analysis & VSM ratings (0.20100)

• some documents produce better relationships than others

Table 1. Results for Methods A & B.method A

(log)method B

(VSM)min 0 1

mean 2.28 6.90max 10 10std 2.28 2.35

Table 2. Mean ratings for method A (log analysis) and method B (VSM) for differentlevels of rater domain knowledge.

KnowledgeLevel

Method Amean

Method Bmean

Number ofRaters

A-B

k=1 (lowest) 2.8955 6.4776 67 3.5821k=2 3.1304 8.0435 23 4.9131k=3 3.5000 7.3750 16 3.8750k=4 2.0833 5.9167 12 3.8334

k=5 (highest) 1.4167 7.5833 12 6.1666

Rating DistributionA=log analysisB=VSM

We Did Not Find What We Hoped…

• Possible methodological shortcomings– we chose the documents randomly; we did not choose

the most “mature” documents from the collection• positive, significant correlation between number of downloads

and preference for the log analysis method (0.201)• negative, significant correlation between number of downloads

and preference for VSM method (-0.32)• positive significant correlation (0.384) between A/B and # of

downloads – paradox: the best qualified raters are the least likely to

show up…

Document / Volunteer Mismatch?Table 1. Subject Area of the 29 Documents

Subject code #Aeronautics 1Aerodynamics 2Air Transportation and Safety 1Avionics and Aircraft Instrumentation 1Aircraft Propulsion and Power 2Launch Vehicles and Launch Operations 3Space Transportation and Safety 1Space Communications, SpacecraftCommunications

1

Spacecraft Design, Testing andPerformance

3

Metals and Metallic Materials 1Fluid Mechanics and Thermodynamics 1Instrumentation and Photography 2Structural Mechanics 1Earth Resources and Remote Sensing 1Meteorology and Climatology 1Mathematical and Computer Sciences 1Computer Programming and Software 3Solid-State Physics 1Documentation and Information Science 2

Titles from the NASA STI Subject Categories, http://www.sti.nasa.gov/subjcat.pdf

Table 1. Organizational Affiliations of The Vo lunteers.# Competency / Branch

1 Atmospheric Sciences Competency/Radiation and AerosolsBranch

2 Aerospace Systems Concepts and Analysis Competency/VehicleAnalysis Branch

1 Aerospace Systems Concepts and AnalysisCompetency/Multidisciplinary Optimization Branch

1 Structures and Materials Competency/Nondestructive EvaluationSciences Branch

1 Office of Chief Information Offi cer/Library and Media ServiceBranch

2 Systems Engineering Competency/Data Analysis and ImagingBranch

4 Systems Engineering Competency/Flight Software SystemsBranch

1 Systems Engineering Competency/Test and Development Branch

Organization names ca. March 2004

1. raters not expert in the documents2. raters’ own publications outside of the NTRS core

Robots in the Mist?

robot noise?

Future Work• More frequent harvesting of logs for more up-to-date

recommendations– currently monthly granularity

• Minimize robot impact on the logs / recommendations• Seed log analysis recommendations with VSM results

– recommendations converge & mature more quickly• Re-run the experiment with:

– more mature documents– more subjects

• aerospace engineering graduate students?– pay them $$$

Conclusions• Slightly disappointing results

– VSM preferred over log analysis• …but, VSM had the deck stacked in its favor:

– significant mismatch between volunteer expertise & article subject– articles randomly chosen from the LaRC collection

• most mature articles not chosen; evidence that log analysis improves with download frequency

• Next steps:– scrub logs more to remove robots, other spurious data sources– mix VSM & log analysis– find a larger, more captive audience

Date post:	10-Feb-2016
Category:	Documents
Upload:	anitra
View:	29 times
Download:	0 times