+ All Categories
Home > Documents > NEEO project EC Final review meeting Gateway and portal 23 March 2010

NEEO project EC Final review meeting Gateway and portal 23 March 2010

Date post: 16-Feb-2016
Category:
Upload: bin
View: 27 times
Download: 0 times
Share this document with a friend
Description:
NEEO project EC Final review meeting Gateway and portal 23 March 2010. Benoit Pauwels Université Libre de Bruxelles, Belgium. Plan. Overview of technical infrastructure EO as a network of data providers – descriptive metadata EO as a network of data providers – usage statistics - PowerPoint PPT Presentation
Popular Tags:
17
NEEO project EC Final review meeting Gateway and portal 23 March 2010 Benoit Pauwels Université Libre de Bruxelles, Belgium 1
Transcript
Page 1: NEEO project EC Final review meeting Gateway and portal 23 March 2010

NEEO project

EC Final review meetingGateway and portal

23 March 2010

Benoit PauwelsUniversité Libre de Bruxelles, Belgium

1

Page 2: NEEO project EC Final review meeting Gateway and portal 23 March 2010

2

• Overview of technical infrastructure

• EO as a network of data providers – descriptive metadata

• EO as a network of data providers – usage statistics

• Added value services• Publication lists• Enriched metadata• Full-text searching• Multilinguality

• Collaboration with RePEc

• EO gateway and portal

Plan

Page 3: NEEO project EC Final review meeting Gateway and portal 23 March 2010

Meresco

Metadata

Harvester

Objects

HTTP

Crawler

Metadata

Lucene

EO portal Homemade - FOSS

Exporter engineHomemade - FOSS

Logs

OAI-PMH

OAI-PMH RSS/Atom

Other portals

SRU

RePEc

SRU

Enrichment service

OAI

-PM

H

DIDL / MODS SWUP

Page 4: NEEO project EC Final review meeting Gateway and portal 23 March 2010

4

Descriptive metadata exchange format

Desired EO functionality Technical decision

Facetted search&find experience Normalized/normalizable metadata

APA formatted citations Granular metadata

Publication list per EO author Unambiguous identification of authors

Full text indexing/searching Unambiguous links to full texts

Enrichment of metadata (JEL, datasets, citations)

Extensible metadata format

Page 5: NEEO project EC Final review meeting Gateway and portal 23 March 2010

5

• DIDL – XML container structure that can hold semantically distinct metadata• Descriptive, object files (by-ref), splash page, enriched metadata • Based on existing container structure defined by SurfShare

• MODS (3.2) – granular descriptive metadata• Based on existing metadata structure defined by SurfShare

• DAI – Unambiguous identification of authors• National or institution-unique persistent identifier

• Continuous aim of standardization at a level that surpasses the NEEO project• NEEO adaptations fed back to SurfShare

Descriptive metadata exchange format

Page 6: NEEO project EC Final review meeting Gateway and portal 23 March 2010

DIDL[1]

Item[1]Descriptor/Identifier (persistent identifier)

Item[1..∞] (of type descriptiveMetadata)

Descriptor/type (« descriptiveMetadata »)

Component/Resource -- representation by value (XML)

Item[0..∞] (of type objectFile)

Component/Resource -- representation by ref. (URL)

Descriptor/modified

Descriptor/Identifier (persistent identifier)

Descriptor/modified

Descriptor/type (« objectFile »)

Descriptor/Identifier (persistent identifier)

Descriptor/modified

Item[0..1] (of type humanStartPage)

Component/Resource -- representation by ref. (URL)

Descriptor/type (« humanStartPage »)

EO descriptive metadata model

• Publication is described as a complex (compound) object– persistent identifier

• Aggregation of 3 types of components– descriptiveMetadata (MODS)– objectFiles– humanStartPage

• Extensible– additional items can be stored within

the complex object

• MODS contains DAI of EO author

• Semantic Web - Linked Data – OAI-ORE ready

Page 7: NEEO project EC Final review meeting Gateway and portal 23 March 2010

7

• Central EO gateway

• DIDL and MODS application profiles• Vocabularies in DIDL and MODS

• Technical guidelines for project partners• All documentation is OA available

• Partner solutions: home-made or with external support

• ARNO home-made• Dspace home-made, AtMire• Eprints home-made, ECS-University Of Southampton• Fedora METS/MODS -> DIDL/MODS• DigiTool METS/MARC -> DIDL/MODS

• All original partners + 2 new partners

Descriptive metadata exchange format

Page 8: NEEO project EC Final review meeting Gateway and portal 23 March 2010

8

• Aim: sustainable solution for big network with many partners

• Decentralized Admin file

• Format XML-RDF | FOAF + NEEO-specific vocabulary• Decentralized file sits on local web server of project partner• Content- information of institution : name, description, ...

- OAI baseURL + OAI sets to harvest- EO authors: DAI, photograph, full name, affiliation

• EO gateway HTTP gets and validates at regular intervals• Used for - information in EO portal screens

- publication lists (match on DAI)- automated harvesting process

Decentralized registry service

Page 9: NEEO project EC Final review meeting Gateway and portal 23 March 2010

9

Usage statistics – EO use case• EO use case: present download rates through EO portal per publication,

scholar, institution

• Normalization of exchange format and communication protocolOAI-PMH exchange of SWUP OpenURL ContextObjects (Scholarly Works Usage Community Profile)

•Special considerations:• Enryption of IP address of requester (MD5)• Filtering out robot requests (list of 50 regular expressions)• Filtering out double clicks

• Similar initiatives come together at Knowledge Exchange workshop, Berlin 29-30 March 2010• JISC (Usage Statistics Review project), Pirus2, SurfSure, Counter, Mesur, OA-

Statistik, Economists Online

Page 10: NEEO project EC Final review meeting Gateway and portal 23 March 2010

10

Usage statistics – implementation status• Central EO Gateway – DoDoCo (Document Download Counter)

• PMH harvesting of SWUP ContextObjects into SQL database• Enrich with information on item, scholar, institution• Web servicelevel (item, scholar, institution) + date range

• Technical guidelines for project partners (OA available)

• Partners

• Implementation - for all major IR platforms- solution for Combined Log Format web logs

• Registration through Admin file• 7 original + 1 new partner

• Not enough data available

• Not visible through EO portal yet, although DoDoCo software is ready

Page 11: NEEO project EC Final review meeting Gateway and portal 23 March 2010
Page 12: NEEO project EC Final review meeting Gateway and portal 23 March 2010

12

• Publication lists

• Per DAI of authors who are registered in Admin file

• SRU extract publications from EO gateway and Format• APA+ in HTML

• with links to full text in EO partner repository• with links to publisher sites (through OpenURL resolution)

• APA in PDF• APA in RTF• RIS• BibTex

Added value services

Page 13: NEEO project EC Final review meeting Gateway and portal 23 March 2010

13

• Enriched descriptive metadata

• JEL classification

• Enrichment service (ES) gets records to be enriched from EO, over SRU• ES creates enrichment record(s), using text mining technology• ES makes enrichment record(s) available to EO, over OAI-PMH• EO harvests enrichment records from ES and integrates into original record• EO reuses enrichment information in its services: index & present

• Bibliographic references

• Through collaboration with RePEc/CitEc

• Visible through EO portal

Added value services

Page 14: NEEO project EC Final review meeting Gateway and portal 23 March 2010

14

• Full-text search service

• Process

• Full-text indexer component in Meresco fetches relevant records from EO Gateway over SRU

• Follow links to PDF object files • Text is extracted from PDF, and added to record through SRU

Update • EO can now index & present

• Prototype exists

• Not yet fully deployed in EO portal

Added value services

Page 15: NEEO project EC Final review meeting Gateway and portal 23 March 2010

15

• Multilinguality (EN, FR, GE, ES)

• Complete EO portal interface• JEL classification• MLIA functionality in EO portal

• Student thesis – Prof. Bouillon (Univ. Of Geneva -- multilingual information processing department )• (uncustomized) Systran and Google Translate show equivalent results

• Contacts with CACAO (also through Europeana)• comes as a complete portal solution, not as an add-in for existing portals

like EO• Considerations:

• Lingua franca in economics = EN• NEEO = NOT research project in linguistics, aim: reuse best existing

technology Use “Google Translate” for translation of queries

Added value services

Page 16: NEEO project EC Final review meeting Gateway and portal 23 March 2010

16

• Harvesting metadata from RePEc into EO• AMF to DIDL/MODS mapping

• Push metadata from EO to RePEc• “RePEc:ner” archive, with separate series for each EO institution• According to agreed-upon reviewed ReDIF format

Admin file directives in order to limit overlap

• Contribute to LogEc

• Reuse CitEc data in EO portal

Collaboration with RePEc

Page 17: NEEO project EC Final review meeting Gateway and portal 23 March 2010

17

• Gateway – metadata store and search engine • Choice between Summa, SOLR/Lucene, Meresco• Open source solution, based on Lucene search engine • Support available from software developers (CQ2 company)• Has proven its qualities in the past (DARENet)

• Portal• First version: home-made• Final version:

• outsourced design to private company• HTML, CSS, JavaScript, all images

EO gateway and portal


Recommended