Date post: | 11-May-2015 |
Category: |
Documents |
Upload: | daniel-vila-suero |
View: | 529 times |
Download: | 5 times |
datos.bne.es: Publishing and
consuming Daniel Vila Suero [email protected]
Ontology Engineering Group, Universidad Politécnica de Madrid Acknowledgements: OEG Members, BNE team (Elena Escolano, Marina Jimenez Piano, Ana Manchado, Mar Hernández Agustí,
Ricardo Santos and others)
2nd Linked Open Data Conference from the Cataloguing and Indexing Group in Scotland (CIGS)
Edinburgh- 21st September 2012
datos.bne.es
2
Background
• Initiative from Biblioteca Nacional de España together with OEG-UPM Madrid.
• Multidisciplinary effort: Librarians, Computer scientists, linguists..
• Close collaboration between library experts and computer scientists.
• Initiated as a small scale proof-of-concept: the "Cervantes dataset" using IFLA vocabularies (FRBR, ISBD) and others (MADS, RDA..)
3
datos.bne.es
Main goals
• Perform the transformation incrementally and iteratively
• Develop a system where library experts can define and assess the mappings to RDF independently from the IT people
• Be vocabulary agnostic (BNE uses FRBR as core model, but the system would allow them to use RDA for example)
• Have a clear picture of the source data before you start to transform (help to detect possible deficiencies in the source data)
4
datos.bne.es
Some figures
5
datos.bne.es • Total number of authority records: 4.100.000 • Total number of bibliographical records: 2.390.140 • Total number of RDF triples: 58.053.215 • Number of links: (15% authorities): 587.520 • Linked sources:
• VIAF • SUDOC (French Collective University Catalogue) FR • GND (German National Library Authorities) GER • LIBRIS Sweden • DBPedia • Soon BNF, BNB, German Bibliographie
Some statistics
6
datos.bne.es
2.390.103
1.969.526
1.163.764
1.114.719
497.644
282.879
Manifestation
Work
Person
Expression
Thema
Corporate Body
Some statistics
7
datos.bne.es
0
500.000
1.000.000
1.500.000
2.000.000
2.500.000 2.129.222 2.129.222
1.246.773
1.246.773
1.054.736
1.054.736
85.347 85.347 78.561 16.462 16.462
755 755
Publishing
8
Our data model
9
Publishing
frbr:WORK frbr:EXPRESSION
frbr:MANIFESTATION
frbr:CORPORATE BODY frbr:PERSON
frsad:THEMA
is creator of is created by
is part of
has subject
is subject of is part of
is embodied in
is subordinate of
frbr frad
ObjectProperty
Class
DatatypeProperties
frsad
frbr
frbr frad
frbr
isbd
PREFIXES frbr: http://iflastandards.info/ns/fr/frbr/frbrer/ frad : http://iflastandards.info/ns/fr/frad/ frsad: http://iflastandards.info/ns/fr/frsad/ isbd: http://iflastandards.info/ns/isbd/elements/
ELEMENTS
is realized through
is realization of
is embodiment of
is realized by
is realizer of
Transformation process
10
Publishing
• How to facilitate the mapping process to library experts? 1. Use a familiar and intuitive interface: Spreadsheets 2. Work only on what's in the database: Pre-process records
to build the spreadsheets
• 3 step-process 3 different spreadsheets
1. Classification: is it a Person? a Work? a Manifestation? 2. Annotation: name, birth date, title, language of expression 3. Relation: find relationships between entities (Person is
creator of a certain work)
100 $a Cervantes Saavedra, Miguel de
100 $a frbr:Person
String(100 $a $t) frbr:isCreatorOf100 $a Cervantes Saavedra, Miguel de$t Don Quijote de la Mancha
String(100 $a)
100 $a $t
frbr:titleOfWork100 $t
MARC 21 DATA MARC 21 STRUCTURE RDFS/OWL
100 $a frbr:nameOfPerson
PRE-PROCESSING STEP
has subfield
has subfield
has heading
has heading
has content
has content
contained in
frbr:Work
Heading Class Object property Datatype/Annotation property
maps to
maps to
maps to
maps to
maps to
Librarians manually define the mappings
Variation(100$a + $t)
11
Publishing
Mapping process
12
Publishing Open mappings at: http://bne.linkeddata.es/mapping-marc21
Mapping process
13
Publishing
Mapping process
14
Publishing
Still a lot of work to do
15
Publishing
• We cover only core relations of FRBR
• There is a significant amount of manifestations not linked to their expressions currently looking at more sophisticated clustering techniques
• Manifestations are not linked to their corresponding digitalized materials at the digital library (Biblioteca Digital Hispánica) Next version (to be published this year) will contain these links
• Classification step can be further automatized
Consuming
16
Perspectives
• 2 different perspectives: - Systems and applications:
• SPARQL endpoint, • Linked Data API
- End-user interfaces
• + an interesting side-effect: - By applying FRBR and RDF mappings we can (and did)
improve the catalogue
• Using standard web technologies and more intuitive models we open the door to:
- Data analytics and cleansing, catalogue enrichment, reuse by smaller institutions… 17
Consuming
Graph analysis example
18
Consuming Don Quijote de la ManchaFrench manifestations
(213)
Novelas EjemplaresSpanish manifestations
(303)
Don Quijote de la ManchaSpanish manifestations
(840)
Don Quijote de la ManchaEnglish manifestations
(247)
Don Quijote de la Manchafrbr:Work
Miguel de Cervantes
Don Quijote de la ManchaGerman manifestations
(49)
EntremesesSpanish manifestations
(86)
frbr:Work frbr:isEmbodiedIn frbr:Expression
frbr:Expression frbr:IsManifestedBy frbr:Manifestation
frbr:Person frbr:isCreatorOf frbr:Work
( ) Number of resources
Using Open-source tools: Gephi for example
http://bne.linkeddata.es/graphvis
Enabling access to systems and apps
19
Consuming Linked Data API: http://datos.bne.es/frontend/persons
Flexible access to data
20
Consuming Out of the box: • Search by every field • Access cluster of resources • Filtering • Paging • Serve multiple formats: XML, Turtle, JSON
Different views over the data
21
Consuming
HTML
XML
22
Consuming END-user interfaces
Current linked data opens the door to: • Re-rank OPAC results • Better clustering of results • Recommendation • Enhance data from other sources