datos.bne.es: Publishing and
consuming
Daniel Vila Suero
Ontology Engineering Group, Universidad Politécnica de Madrid
Acknowledgements: OEG Members, BNE staff (Elena Escolano, Marina Jimenez Piano, Ana Manchado, Mar Hernández Agustí,
Ricardo Santos and others)
2
datos.bne.es
3
Background
• Initiative from Biblioteca Nacional de España together with OEG-UPM Madrid.
• Multidisciplinary effort: Librarians, Computer scientists, linguists..
• Close collaboration between library experts and computer scientists.
• Initiated as a small scale proof-of-concept: the "Cervantes dataset" using IFLA vocabularies (FRBR, ISBD) and others (MADS, DC, RDA..)
datos.bne.es
4
Main goals
• Perform the transformation incrementally and iteratively
• Develop a system where library experts can define and assess the mappings to RDF independently from the IT people
• Be vocabulary agnostic (BNE uses FRBR as core model, but the system would allow them to use RDA for example)
• Have a clear picture of the source data before you start to transform (help to detect possible deficiencies in the source data)
datos.bne.es
5
Source MARC recordsdatos.bne.es
AUTHORITY BIBLIOGRAPHIC
Persons
Corporate bodies
Conferences
Titles
Subject
76576 Maps
320727 Sound recordings
166017 Gravings, drawings, pictures
35770 Manuscripts
143959 Ancient books
2696560 Modern books
178473 Scores
3021 Electronic resources
156634 Serials
96672 Videos
6
Some figuresdatos.bne.es
• Total number of authority records: 4.100.000• Total number of bibliographical records: 2.390.140• Total number of RDF triples: 58.053.215 • Number of links: (15% authorities): 587.520 • Linked sources:
• VIAF• SUDOC (French Collective University Catalogue) FR• GND (German National Library Authorities) GER• LIBRIS Sweden• DBPedia• Soon BNF, BNB, German Bibliographie
7
Some statisticsdatos.bne.es
2,390,103
1,969,526
1,163,764
1,114,719
497,644
282,879
Manifestation
Work
Person
Expression
Thema
Corporate Body
8
Some statisticsdatos.bne.es
"is c
reat
or (p
erso
n) o
f"
"is c
reat
ed b
y (p
erso
n)
"is e
mbo
dimen
t of"
"is e
mbo
died
in"
"is re
alize
d th
roug
h"
"is re
aliza
tion
of"
"is p
art (
work)
of"
"has
par
t (wor
k)"
"is s
ubor
dinat
e of
"
"is c
reat
ed b
y (c
orpo
rate
bod
y)"
"is c
reat
or (c
orpo
rate
bod
y) o
f"
"is p
art (
expr
essio
n) o
f"
"has
par
t (ex
pres
sion)
"
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000 2,129,2222,129,222
1,246,773
1,246,773
1,054,736
1,054,736
85,347 85,347 78,56116,462 16,462
755 755
9
Publishing
10
Our data modelPublishing
11
Transformation processPublishing
• How to facilitate the mapping process to library experts?1. Use a familiar and intuitive interface: Spreadsheets
2. Work only on what's in the database: Pre-process records to build the spreadsheets
• 3 step-process 3 different spreadsheets
1. Classification: is it a Person? a Work? a Manifestation?
2. Annotation: name, birth date, title, language of expression
3. Relation: find relationships between entities (Person is creator of a certain work)
12
Publishing
13
Mapping processPublishingOpen mappings at: http://bne.linkeddata.es/mapping-marc21
14
Mapping processPublishing
15
Mapping processPublishing
16
Still a lot of work to doPublishing
• We cover only core relations of FRBR
• There are a significant amount of manifestations not linked to their expressions currently looking at more sophisticated clustering techniques
• Manifestations are not linked to their corresponding digitalized materials at the digital library (Biblioteca Digital Hispánica) Next version (to be published this year) will contain these links
• Classification step can be further automatized
17
Consuming
18
Perspectives
• 2 different perspectives:- Systems and applications:
• SPARQL endpoint, • Linked Data API
- End-user interfaces
• + an interesting side-effect:- By applying FRBR and RDF mappings we can (and did)
improve the catalogue
• Using standard web technologies and more intuitive models we open the door to:
- Data analytics and cleansing, catalogue enrichment, reuse by smaller institutions…
Consuming
19
Graph analysis exampleConsuming
Using Open-source tools:Gephi for example
http://bne.linkeddata.es/graphvis
20
Enabling access to systems and appsConsuming
Linked Data API: http://datos.bne.es/frontend/persons
21
Flexible access to dataConsuming Out of the box:
• Search by every field• Access cluster of resources• Filtering• Paging• Serve multiple formats: XML,
Turtle, JSON
22
Different views on the dataConsuming
HTML
XML
23
ConsumingEND-user interfaces
Current linked data opens the door to:• Re-rank OPAC results• Better clustering of results• Recommendation• Enhance data from other sources