Intute Repository Search Project
An iterative approach to developinga national search service to support scholarly
communication, teachingand learning
www.intute.ac.uk/irs
Sophia JonesLinda Kerr
September 2008
Introduction
• Background:
- Funded by JISC (Joint Information Systems Committee) - www.jisc.ac.uk
- A Mimas project (www.mimas.ac.uk), in partnership with UKOLN and SHERPA.
• Objectives:
- Develop a UK repository search service to support academic activity
- To serve as a showcase for UK research and education
Aims
• Discovery, harvesting and aggregation of repositories of academic and research papers from HE and other relevant open access sources across the UK
• Creation and maintenance of a store of metadata• Provide improved services to individuals – automatic
profiling based on individual enquiry or learning characteristics
Aims
• To provide a richer and more meaningful contextual search facility including full-text search, text-mining and automatic subject classification
• Investigate opportunities to include other relevant information sources eg non-UK resources
• Help develop meaningful synergies between research repositories and learning repositories
Scope
• Scope in first instance includes repositories of Open Access academic and research papers held in UK HEI Repositories;
• Now harvesting and searching 87 UK Repositories with approx 360,000 artefacts;
• http://www.intute.ac.uk/irs/
• Planned Scope exploration:
– Other OA Repositories e.g., Learning Objects (JORUM) (Moodle)
– Proprietary LO Repositories (Blackboard / WebCT)
– Global research and academic papers
– Optimisation via other JISC Shared Services (IESR, HILT)
Progress
• Phase I – developing and deploying simple search functionality across UK university repositories; new web interface
• Phase II – identified and agreed approaches to developing further search and discovery features
The Challenge & Complexities
• Knowledge Management Context for Researchers, Teachers and Students
Knowledge Context
• Where can I find…?
• What can help me?
• Who can help me?
• What do we know?
• What do I / we don’t know?
Simple Search Full-text Search Subject Classification Automated aggregation Concept matching Personalisation
Search paradigm Discovery Paradigm Meaning-based computing
Content Context
Search and discovery context
Moving beyond the Google Search Box scenario:
• What do I want to know?• What do I not know?• Where can I find it?• How can I access it?• Who can help me in my enquiry or offer new knowledge /
perspectives?• What have other people like me been looking for in similar or
connected areas?• Why do I need this information?• When do I need to be notified that things may have changed?
Broad range of Requirements
Ongoing requirements focussing on:
Researcher (PI, Assistant, Post Grad)
Teaching & Learning Community
Higher Education Academy - LSCAcademic & Research Deans
Knowledge / Info specialists (HE/FE)
Cross-sector shared service developmentsOther National Repository
Aggregators
Common Repository Interface WG - JISC
Research Support Departments
Developers
JISC
UK Research Councils
JISC Repositories & PreservationInstitutional Repository Managers
JISC IE & IEMSR
Librarians
Commercial Technology Stakeholders
Standards Communities
Intute Repository Search Project – Project Strands
Harvester and Aggregation
Service
UKOLN
Normalisedmetadata
R&D metadata
full text urls
IESRs
researchersand
academics
developersof search services
Dissemination and awareness
Requirements Gathering
Intute/Mimas
Community Engagement
Sherpa
Intute/Mimas
Project Management
Intute/Mimas
search server
Web interface
IRS Service
personalisation
m2m interfaces
text mining
subject searching
Learning Objects & VLEs
global searching
Improving the IRS Service
Intute/Mimas/NaCTem
Improved IRS service
Intute/Mimas
search server
Web interface
outward-facingrepository
Development paths
• Development of machine to machine interfaces – SRU/SRW, Z39.50
• Version Control
• Achievable synergies between research and learning object repositories
• Cross-searching of repositories – presentation of results
Advanced discovery and retrieval
• Parallel approaches:– NaCTEM (National Centre for Text Mining) -
www.nactem.ac.uk – Autonomy IDOL software
• Advanced browsing and searching –– Automated document clustering and classification based
on terminology– Personalisation of searching– Concept visualisation from automated clustering
Summary
• Serve as a showcase for UK research and education • Discover, harvest and aggregate repositories of
academic and research papers from HE and other relevant open access sources across the UK
• Provide a richer and more meaningful contextual search facility including full-text search, text-mining and automatic subject classification
• Further expansion based on user requirements/JISC review
http://www.intute.ac.uk/irs/