+ All Categories
Home > Education > BASE : a powerful search engine for Open Access documents

BASE : a powerful search engine for Open Access documents

Date post: 10-May-2015
Category:
Upload: aims-agricultural-information-management-standards-fao-of-the-un
View: 644 times
Download: 3 times
Share this document with a friend
Description:
Presentation delivered by Friedrich Summann during Open Access Week @ AIMS 2012
Popular Tags:
23
Universitätsbibliothek BASE – a powerful search engine for Open Access documents AIMS@OA Week 25 Oct 2012 Friedrich Summann Bielefeld University Library
Transcript
Page 1: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

BASE – a powerful search engine for Open Access documents

AIMS@OA Week

25 Oct 2012

Friedrich SummannBielefeld University Library

Page 2: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

Overview

BASE – the OA search engine

Harvesting OAI-PMH and its challenges

Metadata Aggregation and Data Quality

Processing Subject Repositories

Page 3: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

Harvesting Background

BASE (Bielefeld Academic Search Engine)

• started in 2002, active since 2004• 2900 repositories harvested via OAI-PMH • 2337 repositories indexed • 37.4 Mill. documents included • 3.1 Mill. documents automatically classified• Lucene/Solr Index• VuFind end-user GUI

Page 4: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

Repositories: Geographical Distribution

0.45 m

15,9 m

0.45 m

0.26 m

2,5 m

2.9 m14.0 m

0.45 m

Page 5: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

BASE search features

• Truncation

• Search History

• Sorting

• Drilldown

• Linguistic Tools

(Stemming, Eurovoc Thesaurus)

Page 6: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

Repository Typology

• Institutional Repositories (35 %)

• Thesis and Dissertation Server (11 %)

• Subject Repositories (1 %)

• Electronic Journals (21 %)

• Digital Collections (6 %)

• Others (Videos, Audios, Datasets etc.) (2 %)

Page 7: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

BASE Interfaces

• Query REST interface

• Repository Metadata interface

• Data Delivery Interface (Repository based, DDC of aggregated Metadata) (under construction)

Page 8: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

Overview

BASE – the OA search engine

Harvesting OAI-PMH and its challenges

Metadata Aggregation and Data Quality

Processing Repositories

Page 9: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

My Conclusion:

OAI-PMH Harvesting is easy But:

Putting things (results) together is the real challenge

Page 10: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

Repository does not respond (temporarily, specific verbs) Results are not xml-valid Harvesting breaks (especially big reps) Incremental Harvesting does not work No deleting information, added records Variety of Field Contents Change of behavior (basicurl, contents) Metadata point to reference or citation only Link to Document is not operable Fulltext access is restricted (non OA)

Harvesting : Challenges and pitfalls

Page 11: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

Overview

BASE – the OA search engine

Harvesting OAI-PMH and its challenges

Metadata Aggregation and Data Quality

Processing Subject Repositories

Page 12: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

Top values

en – 1385175eng – 511085spa – 345658de – 319937en_GB - 178381ger – 166587eng; - 102678FR – 95798

…l

dc:language: Variety of Metadata Values

Analysis: European Repositories, Oct. 2009804 different values in 4720585 tags

; - 3? - 3at;deu - 2 enm;eng - 2 FRA – 2fr_BE - 2 Andere Sprache – 2cat, spa, fra, eng. - 2

Page 13: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

Top values

Dataset – 588525Artikel – 192306Rezension – 113924Text – 73210Text.Thesis.Doctoral – 30201Article – 29278Miszelle – 27060NonPeerReviewed – 24688ResearchPaper – 16046Dissertation - 15531

…l

dc:type: Variety of Metadata Values

Analysis: German Repositories, Sept. 20092772 different values in 1394089 tags

Software - 7Kulturkarten - 7Composition - 7Interactive Resource - 4Interview – 3Media - 1content analysis – 1Anniversary Publication – 1qualitative research -1

Page 14: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

Overview

BASE – the OA search engine

Harvesting OAI-PMH and its challenges

Metadata Aggregation and Data Quality

Processing Subject Repositories

Page 15: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

Disciplinary repositories http://oad.simmons.edu/oadwiki/Disciplinary_repositories

OpenDOAR

Subject Repositories: Registries

Page 16: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

The Big Ones:

• arXiv.org (Physics)• CERN Document Server (Physics)• PubMed Central (Life Sciences)• CiteSeer (Computer Science)• ELIS (Library Science)• REPEC (Economics)• EconStor (Economics)• SSOAR (Social Sciences). . .

Subject Repositories in BASE

Page 17: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

The BASE Approach: Automatic Classification

Page 18: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

dc:description: 30 to 40 % of metadata records have dc:description with relevant abstract information

Document fulltext (if accessible)

Setspec contains ddc and lcc codes

dc:subject contains lots of subject-orientated information

Contents for Classifier Feed

Page 19: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

Building the Knowledge Base

Page 20: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

Mapping of frequently used classifications LCCELIS classificationArXiv classification

DDC codes: ~400.000 Documents = 1,4%

Page 21: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

DDC classes distribution in Harvesting Results

Page 22: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

Subject-based Browsing

Page 23: BASE : a powerful search engine for Open Access documents

Universitätsbibliothek

The End. Thank you!

Mail: [email protected]


Recommended