Enriching Cultural Heritage Data with DBpedia

Post on 16-Apr-2017

1,231 views 0 download

transcript

Enriching Cultural Heritage Data with DBpediaAntoine Isaac | DBpedia Community Meeting 2016

Netherlands, Public Domain

1660 - 1625, Rijksmuseum

Anonymous

Arrival of a Portuguese ship

Title hereCC BY-SA

Europeana?

Europeana EssentialsCC BY-SA

Enriching Cultural Heritage Data with DBpediaCC BY-SA

Europeana Collections homepageEuropeana| CC BY-SA

Title hereCC BY-SA

Title hereCC BY-SA

Europeana EssentialsCC BY-SA

Enriching Cultural Heritage Data with DBpediaCC BY-SA

Europeana aggregation infrastructureEuropeana| CC BY-SA

Europeana?

Europeana has many data challenges

Enriching Cultural Heritage Data with DBpediaCC BY-SA

We aggregate very heterogeneous metadata

• More than 48M objects• 3,500 galleries, libraries, archives and museums• 50 languages• From all EU countries• Level of quality varies greatly

Title hereCC BY-SA

Title hereCC BY-SA

Enriching Cultural Heritage Data with DBpediaCC BY-SA

Linked Open Data

Europeana Linked Open Data video on VimeoEuropeana | CC BY-SA

Europeana Linked Data StrategyOur efforts and lines of work

Enriching Cultural Heritage Data with DBpediaCC BY-SA

• The Europeana Data Model (EDM) offers a way to represent richer (linked) data

• We apply an enrichment strategy to link source data to reference data, including DBpedia

Will be discussed in Parallel Session 2:

• We encourage data providers to contribute links between objects and (their own) vocabularies

• We encourage alignment activities between domain vocabularies

Title hereCC BY-SA

Title hereCC BY-SA

Europeana EssentialsCC BY-SA

The Europeana Data Model

Enriching Cultural Heritage Data with DBpediaCC BY-SA

Clavecin, Bartolomeo Cristofori Cite de la Musique, MIMO - Musical Instruments Museums Online|CC BY-NC-SA

Europeana Data Model exampleEuropeana| CC BY-SA

Title hereCC BY-SA

Title hereCC BY-SA

Europeana EssentialsCC BY-SA

Create a “semantic layer” on top of cultural heritage objects

Enriching Cultural Heritage Data with DBpediaCC BY-SA

Include multilingual “value vocabularies” (e.g. thesauri represented SKOS)

from Europeana’s providers or from third-party data sources

Semantic enrichment, a solution for better quality data? Automatic and manual enrichment are more and more commonly used in digital libraries to:

• normalise data

• “standardize data” by linking it to authority resources

• improve multilingual coverage in datasets

• contextualise resources

Enriching Cultural Heritage Data with DBpediaCC BY-SA

The main components of semantic enrichment

CC BY-SA

source objects whose metadata is being enriched set of resources

used to enrich the source metadata

targets can be of different types, from simple uncontrolled strings to resources published as LOD

specify how the enrichment between the source and target should be executed.

SourceTarget

Rules

Enriching Cultural Heritage Data with DBpedia

Automatic enrichment process in Europeana

CC BY-SA

selection of metadata fields in descriptions

selection of potential rules to match

matching the values of the metadata fields to values of the contextual resources

adding contextual links

selection of values from the contextual resource

values go into the search index

Analysis

Linking

Augmentation of search index

Enriching Cultural Heritage Data with DBpedia

CC BY-SAEnriching Cultural Heritage Data with DBpedia

Vocabularies we currently enrich metadata with

CC BY-SAEnriching Cultural Heritage Data with

DBpedia

Entity Class

Target vocabulary Size Metadata Fields subject of Enrichment

Places GeoNames 140,097 dcterms:spatial, dc:coverage

Concepts DBpedia 5,284 dc:subject, dc:type

GEMET 280

Agents DBpedia 161,209 dc:creator, dc:contributor

Time Semium Time 2,566 dc:coverage, dcterms:temporal, dc:date, edm:year

Why DBpedia?

CC BY-SA

Building an ecosystem of networked references

• It offers labels in about 124 languages through all its language editions of which 48 match the languages that Europeana supports

• It gives fairly complete and accurate descriptive metadata about entities

• Works great as a “pivot” vocabulary, providing further links to other vocabularies such as Wikidata and Freebase

Not everything is perfect

France, Public Domain1921, National Library of FranceAgence de presse Meurisse

Colombes : championnats de France d’Athlétisme :rivière, le speaker

Challenges of multilingual automatic enrichment

Evaluation of metadata enrichment practices in digital libraries: steps towards better data enrichments

Poisonous India or the Importance of a Semantic and Multilingual Enrichment StrategyMarlies Olensky, Juliane Stiller, Evelyn Dröge, MTSR 2012 http://link.springer.com/chapter/10.1007%2F978-3-642-35233-1_25

Comparative evaluation of enrichments

CC BY-SAEnriching Cultural Heritage Data with DBpedia

We ran a quantitative evaluation on a sample set enriched by 7 different tools (settings)

http://pro.europeana.eu/taskforce/evaluation-and-enrichments

Example of Recommendations that will be explored

CC BY-SAEnriching Cultural Heritage Data with

DBpedia

Define your enrichment goals• Develop better criteria for evaluating enrichment

Choose the right service• enrichment tool more aware of the semantics of the

model

Monitor your enrichment process and re-assess• target dataset could be richer: new terms, new

languages, more granular

Enrichment using a better reference for contextual entities?

You will hear about this in the next session ☺

Title hereCC BY-SA

Name of image | Creator

Providing organization| Country, licence

Name of image | CreatorProviding organization| Country, licence

With slides from Valentine Charles, Juliane Stiller, Hugo Manguinhas and Stefan Gradmann