20091120 Vlengel Maastricht

Post on 12-Jan-2015

802 views 9 download

Tags:

description

 

transcript

Patrick.Hochstenbach@UGent.be

Frank.Vandepitte@UGent.be

Lucene @ Ghent @ Lund

Vlengel - November 2009 Maastricht

http://lib.ugent.be

http://elin.ugent.be

The Numbers

5.000.000

Bibliographic RecordsFull-Text: ca 20%

490.000 Google Books Hathi136.000 18th Cent. Coll. Online100.000 Early English Books 32.000 Google Books Gent 82.000 Gutenberg, DBNL, SFX,…

Ghent

16 Collections 120.000 visits/month

34% via search engines

The Numbers

54.000.000

Bibliographic Records

ELIN

Full-Text: 100 %29 customers worldwide

6 timezones17.000.000 electronic journals25.000.000 Ebsco 4.000.000 JSTOR 3.400.000 Proquest ABI 1.670.000 IEE/IEEE standars/proceedings 1.300.000 E-print archives

The Parts

Searching/Portal

Verity

Indexing

FastAutonomy

ZebraLucene/Solr

Sphinx

DrupalLiferay

JBoss

Zope

Primo Aquabrowser VuFind Sesat

Endeca

Indexing

ALEPHSEQOAI-PMHMySQL DUMP

XSL

indexML

rug01_xmlrug02_xmlhath01_xmldbnl_xml

Cmdline tools Tomcat Servlet

Tuned Solr

Perl MVC

Java/Spring

Searching/Portal

Search Engine Plugins

Lucene SOLR

ISI/WOS YouTube

OpenSearchSRU

Models

KeyVal

XML MARC

Configuration Files

I18N Props

Default

OpenSearch UnAPIHTM

LVelocity JS

MeercatControllersViews

The ‘Haves’

Facets Filters

RSS

OpenSearch

OpenURL

OAI-PMH

Mobile

TicToc Cover ArtStatistics

Cool URI’s

unAPI

Zotero

Google Maps

Stemming

Flexible Sort

Image Browsing

Zoomers

Pagers

Basket

Plugin/Integration

libX

Real-Time Availability CheckRequesting

Global Holdings

Full-Text Links

Lists: Journals, Databases, Collections

Diacrit translation

The ‘Have Nots’

Nice Administrative Interface

ILS integration (requests, renewals, …)

Personalization (saved searches, alerts,…)

Tagging, Rating, User Contributed Content

Deduplication

Excerpts, Table of Contents

Word clouds

Expand Searches (see also)

Highlighting

Federated Search

Browsing

Advanced Search

Extended FRBR

The Characteristics

Lightweight, Tunable

In Lund 54.000.000 indexed on 1 Linux 4-core machine 16GB RAM +/- 2000 records/second

In Ghent indexation runs on Aleph server during business hours

Continuous 100 simultaneous users on 1 Linux 2-core machine 4GB RAM

Simple, easy web interface. Less is more

The Characteristics

Flexible

Used in 6 different projects in Gent, 2 in Lund

KeyVal, XML, MARC models can be used internally

Indexes anything that can be turned into our XML index format

Total control on every aspect of interface. We do text, images, video, mobile, RSS, …

The Characteristics

Very Large Developer Community

Open Source used in thousands of projects worldwide in all major (computer) languages

Extensive Documentation, many articles, presentations, research

Books, User Group, Conferences, Social Networks,…

But…

Acknowledgements

Kjell Lotigiers (UGent)– Java/Spring development

Salam Baker Shanawa (Lund) – Perl/ELIN development, System tuning

Nicolas Steenlant (UGent) – Ajax/CSS development

Geert Roels (UGent) – Web Design

Paul Bastijns (UGent) – SFX integration

Refs

Calhoun, K., & Cellentani, D. (2009). Online catalogs: What users and librarians want : an OCLC report. Dublin, Ohio: OCLC.

http://lib.ugent.be

http://www.lub.lu.se