+ All Categories
Home > Software > Ontology-based multi-domain metadata for research data management using triple stores

Ontology-based multi-domain metadata for research data management using triple stores

Date post: 25-Jun-2015
Category:
Upload: joao-rocha-da-silva
View: 413 times
Download: 0 times
Share this document with a friend
Description:
A presentation given on the IDEAS 2014 Conference about database modelling using triple stores for research data management. IDEAS '14, July 07 - 09 2014, Porto, Portugal. Paper Abstract: Most current research data management solutions rely on a fixed set of descriptors (e.g. Dublin Core Terms) for the description of the resources that they manage. While these are easy to understand and use, their semantics are limited to general concepts, leaving out domain-specific metadata and representing values as sets of text values. While this enables retrieval through free-text search, faceted search and dataset interlinking becomes limited. From the point of view of the relational database schema modeler, designing a more flexible metadata model represents a non-trivial challenge because of the open nature of the model. This work demonstrates the current approaches followed by current open-source platforms and propose a graph-based model for achieving modular, ontology-based metadata for interlinked data assets in the Semantic Web. This proposed model was implemented in a collaborative research data management platform currently under development at the University of Porto.
Popular Tags:
66
Ontology-based multi-domain metadata for research data management using triple stores João Rocha da Silva [email protected] Faculdade de Engenharia da Universidade do Porto / INESC TEC Cristina Ribeiro [email protected] DEI—Faculdade de Engenharia da Universidade do Porto / INESC TEC João Correia Lopes [email protected] IDEAS '14, July 07 - 09 2014, Porto, Portugal
Transcript
Page 1: Ontology-based multi-domain metadata for research data management using triple stores

Ontology-based multi-domain metadata for research data

management using triple stores

João Rocha da Silva [email protected]

Faculdade de Engenharia da

Universidade do Porto / INESC TEC

Cristina Ribeiro [email protected] DEI—Faculdade de

Engenharia da Universidade do

Porto / INESC TECJoão Correia Lopes [email protected]

IDEAS '14, July 07 - 09 2014, Porto, Portugal

Page 2: Ontology-based multi-domain metadata for research data management using triple stores

Contents• Diverse metadata: relational modeling challenges

• Current approaches built on relational databases

• Dendro: graph-based research data management

• Live demo

• Conclusions

2

Page 3: Ontology-based multi-domain metadata for research data management using triple stores

Problem: diverse metadataRelational modeling challenges

3

Page 4: Ontology-based multi-domain metadata for research data management using triple stores

Analytical Chemistry Dataset

Mechanical Engineering Dataset …

GenericAuthor

Description Creation date

Author Description

Creation date …

Domain Specific

Sample Count Analysed Substance

Initial Crack Length Specimen Type

4

Page 5: Ontology-based multi-domain metadata for research data management using triple stores

Common challenges in RDB schema modeling

• Entities with unknown attributes at time of modeling

• Time-variant attribute values

• Inheritance / sub-class mapping

• Resource hierarchies (parents of parents…)

• Schemas rely on external documentation5

Page 6: Ontology-based multi-domain metadata for research data management using triple stores

Data management and description platforms

Study of relational models

6

Page 7: Ontology-based multi-domain metadata for research data management using triple stores

DSpace

• Academic publications management platform

• Not targeted specifically at data

• More than 1000 active installations

• Mature open-source codebase

7

Page 8: Ontology-based multi-domain metadata for research data management using triple stores

DSpace

• Designed for self-deposit by common users

• Good deposit workflow (validation, licensing…)

8

Page 9: Ontology-based multi-domain metadata for research data management using triple stores

U.Porto Open Repository Homepage (http://repositorio-aberto.up.pt)

Powered by DSpace

9

Page 10: Ontology-based multi-domain metadata for research data management using triple stores

Powered by DSpace

A thesis record in the repository (http://repositorio-aberto.up.pt/handle/10216/58508) 10

Page 11: Ontology-based multi-domain metadata for research data management using triple stores

Bitstream Metadata Schema

Metadata Descriptor

Item

*

1**

metadata value

*

1

11

Page 12: Ontology-based multi-domain metadata for research data management using triple stores

DSpace

12

Page 13: Ontology-based multi-domain metadata for research data management using triple stores

• Metadata profiles for objects other than Items

• Descriptor hierarchy for specialization

• Collaborative schema derivation

• Validation of metadata completeness against different schemas

• Restricting possible metadata for each type of resource

New requirements

13

Page 14: Ontology-based multi-domain metadata for research data management using triple stores

14

Page 15: Ontology-based multi-domain metadata for research data management using triple stores

CKAN

• Open-source data publishing platform

• Deposit requires minimal metadata at first

• Flexible metadata model

• Open-Source

15

Page 16: Ontology-based multi-domain metadata for research data management using triple stores

1

2

16

Page 17: Ontology-based multi-domain metadata for research data management using triple stores

1

17

Page 18: Ontology-based multi-domain metadata for research data management using triple stores

!source CKAN 18

Page 19: Ontology-based multi-domain metadata for research data management using triple stores

!source CKAN 18

Page 20: Ontology-based multi-domain metadata for research data management using triple stores

Entity with variable, time-dependent

attributes

!source CKAN 18

Page 21: Ontology-based multi-domain metadata for research data management using triple stores

Entity with variable, time-dependent

attributes

Fixed attrs.

!source CKAN 18

Page 22: Ontology-based multi-domain metadata for research data management using triple stores

Attribute name

Entity with variable, time-dependent

attributes

Fixed attrs.

!source CKAN 18

Page 23: Ontology-based multi-domain metadata for research data management using triple stores

Attribute name

Value (always varchar)

Entity with variable, time-dependent

attributes

Fixed attrs.

!source CKAN 18

Page 24: Ontology-based multi-domain metadata for research data management using triple stores

Attribute name

Timestamps

Value (always varchar)

Entity with variable, time-dependent

attributes

Fixed attrs.

!source CKAN 18

Page 25: Ontology-based multi-domain metadata for research data management using triple stores

Invenio• Software behing Zenodo, a data publishing portal

• Static metadata model

• Very complex relational schema generated by business logic code

• Tight coupling between DB and code

• Open-Source

19

Page 26: Ontology-based multi-domain metadata for research data management using triple stores

1

2

20

Page 27: Ontology-based multi-domain metadata for research data management using triple stores

541 Tables

No FKs

!21

Page 28: Ontology-based multi-domain metadata for research data management using triple stores

!22

Page 29: Ontology-based multi-domain metadata for research data management using triple stores

!22

Page 30: Ontology-based multi-domain metadata for research data management using triple stores

OntologiesSemantic annotation for richer metadata

23

Page 31: Ontology-based multi-domain metadata for research data management using triple stores

24

Page 32: Ontology-based multi-domain metadata for research data management using triple stores

!!!!!!

http://dendro.fe.up.pt/project/datanotes/data/base

%20data.xls

24

Page 33: Ontology-based multi-domain metadata for research data management using triple stores

!!!!

http://dendro.fe.up.pt/project/datanotes/data

nie:isLogicalPartOf

!!!!!!

http://dendro.fe.up.pt/project/datanotes/data/base

%20data.xls

24

Page 34: Ontology-based multi-domain metadata for research data management using triple stores

!!!!

http://dendro.fe.up.pt/project/datanotes/data

nie:isLogicalPartOf

rdf:type

nie:File

!!!!!!

http://dendro.fe.up.pt/project/datanotes/data/base

%20data.xls

24

Page 35: Ontology-based multi-domain metadata for research data management using triple stores

!!!!

http://dendro.fe.up.pt/project/datanotes/data

nie:isLogicalPartOf

“Base data of the DCB experiments”

dc:titlerdf:type

nie:File

!!!!!!

http://dendro.fe.up.pt/project/datanotes/data/base

%20data.xls

24

Page 36: Ontology-based multi-domain metadata for research data management using triple stores

!!!!

http://dendro.fe.up.pt/project/datanotes/data

nie:isLogicalPartOf

“Base data of the DCB experiments”

dc:title

base data.xls

nie:title

rdf:type

nie:File

!!!!!!

http://dendro.fe.up.pt/project/datanotes/data/base

%20data.xls

24

Page 37: Ontology-based multi-domain metadata for research data management using triple stores

!!!!

http://dendro.fe.up.pt/project/datanotes/data

nie:isLogicalPartOf

“Base data of the DCB experiments”

dc:title

base data.xls

nie:title

rdf:type

nie:File

base data.xls

dcb:initialCrackLength

!!!!!!

http://dendro.fe.up.pt/project/datanotes/data/base

%20data.xls

24

Page 38: Ontology-based multi-domain metadata for research data management using triple stores

Semantic MediaWiki• Semantic extension of MediaWiki, the code behind

Wikipedia

• Semantic Links between pages

• Uses ontologies

• Strong emphasis on page versioning

• DB schema built around the time dimension

25

Page 39: Ontology-based multi-domain metadata for research data management using triple stores

Loading an ontology

26

Page 40: Ontology-based multi-domain metadata for research data management using triple stores

Describing a resource

27

Page 41: Ontology-based multi-domain metadata for research data management using triple stores

Semantic Forms

From DataNotes + UPBox http://purl.pt/24107/1/iPres2013_PDF/UPBox%20and%20DataNotes%20a%20collaborative%20data%20management%20environment%20for%20the%20long%20tail%20of%20research%20data.pdf

28

Page 42: Ontology-based multi-domain metadata for research data management using triple stores

Semantic Forms

From DataNotes + UPBox http://purl.pt/24107/1/iPres2013_PDF/UPBox%20and%20DataNotes%20a%20collaborative%20data%20management%20environment%20for%20the%20long%20tail%20of%20research%20data.pdf

29

Page 43: Ontology-based multi-domain metadata for research data management using triple stores

Semantic Forms

From DataNotes + UPBox http://purl.pt/24107/1/iPres2013_PDF/UPBox%20and%20DataNotes%20a%20collaborative%20data%20management%20environment%20for%20the%20long%20tail%20of%20research%20data.pdf

30

Page 47: Ontology-based multi-domain metadata for research data management using triple stores

Redundancy…

Relational Database (MySQL)

Triple Store (Apache

Jena)Mapping Logic

33

Page 48: Ontology-based multi-domain metadata for research data management using triple stores

CKAN

DSpace

Invenio

Semantic MediaWiki

Time

Flexible attributes

Wide use

DB-code coupling

34

Page 49: Ontology-based multi-domain metadata for research data management using triple stores

Issues review• Entities with unknown attributes at time of modeling

• Time-variant attribute values

• Inheritance / sub-classing

• Hierarchies (parents of parents of parents…)

• Need for external documentation

35

Page 50: Ontology-based multi-domain metadata for research data management using triple stores

Dendroa graph-based data management platform

36

Page 51: Ontology-based multi-domain metadata for research data management using triple stores

Graph databases • Represent entities (Users, Products, Places…) as

vertexes (entity types are called classes)

• Connections between them are directed graph edges (edge types are called properties)

!

• The meaning of these connections is expressed in ontologies that can be shared and reused

37

Page 52: Ontology-based multi-domain metadata for research data management using triple stores

Getting all my Projects

• Will fetch all the projects created by the user

• Will also return their attributes (“database columns”)

• Different projects may have different attributes38

Page 53: Ontology-based multi-domain metadata for research data management using triple stores

Inference

• Transitive Properties

• Subclasses

• Multiple Inheritance

•Resource can be a Folder and a Dataset at the same time)

39

Page 54: Ontology-based multi-domain metadata for research data management using triple stores

Loading an ontology

• Load ontology straight from the web

• No platform-specific syntax (like in SMW)

40

Page 55: Ontology-based multi-domain metadata for research data management using triple stores

Nothing comes for free• Aggregation operators slow

• No ACID properties

• Transactions are not supported in standard SPARQL

• (“SPARQL 1.1 Query/Update Services should be atomic but that they are not required to be atomic.”)

• Graph DBMS Solutions are in early stages (many bugs, many “beta”s, many mailing lists…)

41

Page 56: Ontology-based multi-domain metadata for research data management using triple stores

Dendro • Dropbox and File/Folder description platform

• Variable descriptions

• Time-dependent values

• Directory structures (hierarchy)

• Need for simple querying…

42

Page 57: Ontology-based multi-domain metadata for research data management using triple stores

nie:isLogicalPartOf

Pn

Dn

280mm

“DCB Base Data”

120

Dn-1

dcb:initialCrackLength

dc:title

dcb:specimenWidth

dc:isReferencedBy

Fn

120

dc:title

dcb:specimenWidth

dc:isVersionOf

Added propertyinstance

01/01/2014^^xsd:date

dc:created

01/01/2014^^xsd:date

dc:modified

Changedmodificationtimestamp

Revision creation

timestamp

Un

dc:creator

Current dataset version Past Revisions

ddr:pertainsTo

Change recording

C

ddr:initialCrackLen

gth

ddr:changedDescriptor

“add”

ddr:operation

“DCB Base Data”

43

Page 58: Ontology-based multi-domain metadata for research data management using triple stores

nie:isLogicalPartOf

Pn

Dn

280mm

“DCB Base Data”

120

Dn-1

dcb:initialCrackLength

dc:title

dcb:specimenWidth

dc:isReferencedBy

Fn

120

dc:title

dcb:specimenWidth

dc:isVersionOf

Added propertyinstance

01/01/2014^^xsd:date

dc:created

01/01/2014^^xsd:date

dc:modified

Changedmodificationtimestamp

Revision creation

timestamp

Un

dc:creator

Current dataset version Past Revisions

ddr:pertainsTo

Change recording

C

ddr:initialCrackLen

gth

ddr:changedDescriptor

“add”

ddr:operation

“DCB Base Data”

43

Page 59: Ontology-based multi-domain metadata for research data management using triple stores

nie:isLogicalPartOf

Pn

Dn

280mm

“DCB Base Data”

120

Dn-1

dcb:initialCrackLength

dc:title

dcb:specimenWidth

dc:isReferencedBy

Fn

120

dc:title

dcb:specimenWidth

dc:isVersionOf

Added propertyinstance

01/01/2014^^xsd:date

dc:created

01/01/2014^^xsd:date

dc:modified

Changedmodificationtimestamp

Revision creation

timestamp

Un

dc:creator

Current dataset version Past Revisions

ddr:pertainsTo

Change recording

C

ddr:initialCrackLen

gth

ddr:changedDescriptor

“add”

ddr:operation

“DCB Base Data”

43

Page 60: Ontology-based multi-domain metadata for research data management using triple stores

nie:isLogicalPartOf

Pn

Dn

280mm

“DCB Base Data”

120

Dn-1

dcb:initialCrackLength

dc:title

dcb:specimenWidth

dc:isReferencedBy

Fn

120

dc:title

dcb:specimenWidth

dc:isVersionOf

Added propertyinstance

01/01/2014^^xsd:date

dc:created

01/01/2014^^xsd:date

dc:modified

Changedmodificationtimestamp

Revision creation

timestamp

Un

dc:creator

Current dataset version Past Revisions

ddr:pertainsTo

Change recording

C

ddr:initialCrackLen

gth

ddr:changedDescriptor

“add”

ddr:operation

“DCB Base Data”

43

Page 61: Ontology-based multi-domain metadata for research data management using triple stores

nie:isLogicalPartOf

Pn

Dn

280mm

“DCB Base Data”

120

Dn-1

dcb:initialCrackLength

dc:title

dcb:specimenWidth

dc:isReferencedBy

Fn

120

dc:title

dcb:specimenWidth

dc:isVersionOf

Added propertyinstance

01/01/2014^^xsd:date

dc:created

01/01/2014^^xsd:date

dc:modified

Changedmodificationtimestamp

Revision creation

timestamp

Un

dc:creator

Current dataset version Past Revisions

ddr:pertainsTo

Change recording

C

ddr:initialCrackLen

gth

ddr:changedDescriptor

“add”

ddr:operation

“DCB Base Data”

43

Page 62: Ontology-based multi-domain metadata for research data management using triple stores

nie:isLogicalPartOf

Pn

Dn

280mm

“DCB Base Data”

120

Dn-1

dcb:initialCrackLength

dc:title

dcb:specimenWidth

dc:isReferencedBy

Fn

120

dc:title

dcb:specimenWidth

dc:isVersionOf

Added propertyinstance

01/01/2014^^xsd:date

dc:created

01/01/2014^^xsd:date

dc:modified

Changedmodificationtimestamp

Revision creation

timestamp

Un

dc:creator

Current dataset version Past Revisions

ddr:pertainsTo

Change recording

C

ddr:initialCrackLen

gth

ddr:changedDescriptor

“add”

ddr:operation

“DCB Base Data”

43

Page 63: Ontology-based multi-domain metadata for research data management using triple stores

Demo

Dendroβ

44

Page 64: Ontology-based multi-domain metadata for research data management using triple stores

Conclusions• Recording rich metadata requires data model

flexibility

• Unknown attributes, time-variant information or hierarchies can be hard to model in a relational database

• Several current solutions make compromises due to their relational database layer

45

Page 65: Ontology-based multi-domain metadata for research data management using triple stores

Conclusions (cont’d)• Graph-based models are more flexible and easily

expansible through ontology loading

• Ontologies are shareable on the web, and document the database “schema”

• Queries become simpler due to the graph model’s ability to easily model challenging scenarios for RDBs

• Dendro is a collaborative data management platform fully built on a graph model

46

Page 66: Ontology-based multi-domain metadata for research data management using triple stores

João Rocha da Silva is an Informatics Engineering PhD student at the Faculty of Engineering of the University of Porto. He specializes on research data management, applying the latest Semantic Web Technologies to the adequate preservation and discovery of research data assets.!!He is also an experienced freelancer iOS Developer with several Apps published on the App Store, and a self-taught DIY mechanic with a special interest in classic cars, particularly his 1987 Toyota Corolla GT Twin Cam, also known as Hachi-Roku or AE86.!

Research Data Management and Semantic Web Researcher, Web & iPhone DeveloperJoão Rocha da Silva!

João Correia Lopes is an Assistant Professor in Informatics Engineering at Universidade do Porto and a researcher at INESC TEC. He has graduated in Electrical Engineering in the University of Porto in 1984 and holds a PhD in Computing Science by Glasgow University in1997. His teaching includes undergraduate and graduate courses in databases and web applications, software engineering and object-oriented programming, markup languages and semantic web. He has been involved in research projects in the area of long-term preservation, service-oriented architectures and e-Science. Currently his main research interests are e-Science and the management of research data.

Cristina Ribeiro is an Assistant Professor in Informatics Engineering at Universidade do Porto and a researcher at INESC TEC. She has graduated in Electrical Engineering, holds a Master in Electrical and Computer Engineering and a Ph.D. in Informatics. Her teaching includes undergraduate and graduate courses in information retrieval, digital libraries, knowledge representation and markup languages. She has been involved in research projects in the areas of cultural heritage, multimedia databases and information retrieval. Currently her main research interests are information retrieval, digital preservation and the management of research data.

Assistant Professor in Informatics Engineering at Universidade do Porto, Researcher at INESC TECCristina Ribeiro!

Assistant Professor in Informatics Engineering at Universidade do Porto, Researcher at INESC TECJoão Correia Lopes!


Recommended