Design, implementation and deployment of Research Objects ... · Design, implementation and...

H2020 – EINFRA – 2015 – 1 Page 1 of 43

Design, implementation and deployment of Research Objects components for Earth Science

Phase 2 Workpackage 4 Research Objects in Earth Science

Task (s) 4.3 Research Objects search and recommendation in Earth Science communities

4.4 Preservation and curation of Research Objects in Earth Science: RO life cycle and quality management

4.5 Social aspects of Research Objects in Earth Science: traceability, citation, trust, credit and attribution

Author (s) Andrés García-Silva ESI Jose Manuel Gomez-Perez ESI Raul Palma PSNC

Reviewer (s) Ugo Di Giammatteo ACS

Raul Palma PSNC

Approver (s) Jose Manuel Gomez-Perez ESI Cristiano Silvagni ESA

Authorizer Mirko Albani ESA

Document Identifier EVER-EST DEL WP4-D4.4

Dissemination Level Public

Status Approved by the EC

Version 1.2

Date of Issue 14 December 2018

H2020 – EINFRA – 2015 – 1 Page 2 of 43

Abstract

One of the main goals of the EVER-EST project is to implement the research object model, and an ecosystem of tools supporting this model, in earth sciences to foster the sharing and reuse of scientific results, and to ensure their preservation in the long term. The research object model supports knowledge sharing and reuse mainly using a common vocabulary described using a web standard data model specifically designed for data interchange, and a rich knowledge representation language. The model supports the definition of data access levels and copyright information.

This deliverable describes the components that complement the research object model mainly enhancing the research object findability, traceability, and quality monitoring. The research object findability is improved by the semantic enricher process that generates metadata that describes its content. The metadata is used in two complementary retrieval mechanism: a semantic search engine to find research objects and a recommender system that helps to explore the research object collection including the social dimensions of the research activity. The traceability is implemented with Digital Objects Identifiers that are integrated in the research object lifecycle, with the additional benefit that authors can receive proper credit for their work. Finally, the quality of research objects is assessed through custom checklists designed with the help of the Virtual Reaserch Communities. These checklists are run regularly and there are available tools to monitor the assessment variations through time so that decaying research objects can be intervened if necessary.

H2020 – EINFRA – 2015 – 1 Page 3 of 43

Document Log

Date Author Changes Version Status

14/02/2017 Andrés García-Silva Table of Content 0.1 Draft

17/02/2017 Andrés García-Silva Semantic Enrichment Processes

0.2 Draft

24/02/2017 Andrés García-Silva Semantic search engine and Recommender

0.3 Draft

01/03/2017 Andrés García-Silva Checklists, DOIs, Introduction

0.4 Draft

14/03/2017 Raul Palma Semantic annotations, search, and DOIs and FORK

0.5 Draft

17/03/2017 Andrés García-Silva CNR, SatCen, NHP scenarios

0.6 Draft

22/03/2017 Andrés García-Silva First draft finished 0.7 Draft

28/03/2017 Jose Manuel

Gomez-Perez

Incorporated INGV RO usage scenario and final check on draft

0.8 Draft

11/04/2017 Jose Manuel

Gomez-Perez

Complete revision after internal review

1.0 Draft to be approved by the EC

23/05/2017 Andrés García-Silva Chapters 1.0 and 3.4 update

1.1 Draft to be approved by the EC

14/12/2018 Cristiano Silvagni Document status 1.2 Approved by the EC

H2020 – EINFRA – 2015 – 1 Page 4 of 43

Table of Contents 1 Introduction ........................................................................................................................................... 8

1.1 Relation to other work packages ...................................................................................................... 9

1.2 Compliance to the Smart Objectives and Key Performance Indicators ............................................. 10

1.3 Overview of the document............................................................................................................. 11

2 Use Scenarios of Research Object Components ..................................................................................... 12

2.1 Sea Monitoring Scenario ................................................................................................................ 12

2.1.1 Current limitations ............................................................................................................................. 12

2.1.2 Use and impact of research object components to overcome the limitations ................................. 12

2.2 Natural Hazard Protection Scenario ................................................................................................ 13



2.3 Land Monitoring Scenario .............................................................................................................. 14



2.4 Supersites Scenario ........................................................................................................................ 16



3 Components for Search and Recommendation of Research Objects ....................................................... 19

3.1 Semantic enrichment of research objects ....................................................................................... 19

3.1.1 COGITO............................................................................................................................................... 20

3.1.2 Research object annotations ............................................................................................................. 21

3.1.3 The ontology for content description ................................................................................................ 21

3.1.4 Example of semantic enrichment using annotations ........................................................................ 23

3.1.5 Semantic enrichment integration in ROHUB ..................................................................................... 24

3.1.6 Semantic annotations in ROHUB ....................................................................................................... 26

3.2 Semantically-enhanced search of research objects ......................................................................... 28

3.3 Recommender system ................................................................................................................... 30

3.3.1 Collaboration Spheres........................................................................................................................ 30

3.3.1.1 Authentication ......................................................................................................................................... 30

3.3.1.2 Navigation panel ..................................................................................................................................... 31

3.3.1.3 Spheres .................................................................................................................................................... 33

3.3.2 Collaboration Spheres in ROHUB ....................................................................................................... 34

3.4 Research Object Aggregation Tool .................................................................................................. 34

4 Components for Preservation and Curation of Research Objects ........................................................... 36

4.1 Checklist for bibliographic research objects .................................................................................... 36

H2020 – EINFRA – 2015 – 1 Page 5 of 43

4.2 EVER-EST research object types taxonomy ..................................................................................... 37

4.3 Technological support in ROHUB .................................................................................................... 38

5 Components Supporting the Social Aspects of Research Objects ............................................................ 41

5.1 ROHUB support of the research object lifecycle management, including assignation of DOIs ........... 41

List of Tables Table 1. Checklist for bibliographic research objects.................................................................................................. 37

List of Figures Figure 1. Work Package Dependencies ......................................................................................................................... 9

Figure 2. Example of semantic enrichment................................................................................................................. 20

Figure 3. Annotation Data Model [1] .......................................................................................................................... 21

Figure 4. Preview of contentdesc ontology documentation....................................................................................... 22

Figure 5. Main concepts of the ContentDesc ontology............................................................................................... 23

Figure 6. Semantic search integration in ROHUB based on an asynchronous web service architecture ................... 25

Figure 7. Research object overview ............................................................................................................................ 27

Figure 8. Advanced annotations view panel ............................................................................................................... 28

Figure 9. Keyword-based search in main ROHUB page (enhanced with automatic annotations) ............................. 29

Figure 10. Faceted search in ROHUB (enhanced with automatic annotations) ......................................................... 29

Figure 11. General overview of the Collaboration Spheres ........................................................................................ 31

Figure 12. Research object summarization card. ........................................................................................................ 32

Figure 13. User summarization card. .......................................................................................................................... 32

Figure 14. Sphere components ................................................................................................................................... 33

Figure 15. ROHUB header ........................................................................................................................................... 34

Figure 16. Aggregation tool integrated in the semantic search engine...................................................................... 34

Figure 17. Research object aggregation tool .............................................................................................................. 35

Figure 18. Research Object type taxonomy for earth observation ............................................................................. 38

Figure 19. Research object quality tab in ROHUB ....................................................................................................... 39

Figure 20. Research object overview tab (partial view) in ROHUB ............................................................................. 40

Figure 21. Research object overview tab (partial view) in ROHUB ............................................................................. 42

Figure 22. Research object lifecycle tab (under development) in ROHUB ................................................................. 43

H2020 – EINFRA – 2015 – 1 Page 6 of 43

Definitions and Acronyms

Acronym Description

DOI Digital Object Identifiers

FAIR Findable, Accessible, interoperable and Reusable

HIM Hazard Impact Models

IPR Intellectual Property Rights

IR Information Retrieval

NLP Natural Language Processing

RO Research Object

RDF Resource Description Framework

RODL Research Object Digital Library

ROEVO Research Object Evolution Ontology

RO MODEL Research Object Core Ontology

SKOS Simple Knowledge Organization System

OWL Web Ontology Language

VRC Virtual Research Community

VRE Virtual Research Environment

VSM Volcano Source Model

W3C World Wide Web Consortium

WFDESC Workflow Description Ontology

WFPROV Workflow Execution Provenance Ontology

WP Work Package

Reference Documents

Document ID Document Title

[1] D4.3 Design, implementation and deployment of Research Objects components for Earth Science Phase 1. Andrés García-Silva, Jose Manuel Gomez-Perez, Raul Palma.

[2] D4.1 Workflows and Research Objects in Earth Science - Concepts and Definitions. Jose Manuel Gomez-Perez, Panos Alexopoulos, Nuria Garcia, Raul Palma.

[3] D4.2 Workflows and Research Objects models in Earth Science. Jose Manuel Gomez-Perez, Raul Palma, Nuria García.

[4] Rafael González-Cabero, Raúl Palma (PSNC), Jose Gómez-Pérez, Aleix Garrido. 2013. D3.2v2 Design, implementation and deployment of Workflow Integrity and Authenticity Maintenance components - Phase II, Wf4Ever project.

H2020 – EINFRA – 2015 – 1 Page 7 of 43

[5] Esteban García-Cuesta, Graham Klyne, Aleix Garrido, Jose Manuel Gómez-Pérez, Jun Zhao. 2013. D4.2: Design, implementation and deployment of Workflow Integrity and Authenticity Maintenance components – Phase II

H2020 – EINFRA – 2015 – 1 Page 8 of 43

1 Introduction The research object model and related components are the main tools to support the goals of the EVER-EST project regarding sharing and reuse of research outputs and their preservation in the long term. The research object model relies on solid foundations that ease the sharing and reuse of knowledge by using a common vocabulary that is exposed using the W3C Resource Description Framework RDF and the Web Ontology Language OWL, the former as data model specifically designed for data interchange in the web, and the latter as rich knowledge representation model. In practice this means that research objects can be easily consumed by software agents since data is exposed in an standard language (RDF) and the semantics of the data is explicitly defined in a vocabulary written in a standard knowledge representation language (OWL).

The research object model is complemented with a suite of tools that boost the research object sharing and reuse, and preservation through enhancing the research object findability, enabling quality assessments, and using unique permanent identifiers when releasing research objects. These tools include a semantic enrichment process, a semantic search engine, a recommender system, checklists specially adapted to earth science, and the Digital Objects Identifiers DOIs. This deliverable describes the final and integrated versions of these tools that were preliminary described in deliverable D4.3. Whenever it is required a reference is made to D4.3 to avoid duplicity of information. In addition, the web site http://everest.expertsystemlab.com/ was created to publish, describe and link these research object components.

The semantic enrichment process is in charge of generating new metadata out of the content of research objects. This metadata comprises the main concepts found in resources containing text in the research object, the main knowledge areas in which these concepts are most frequently used, the main expressions, known in computational linguistics as noun phrases, found in the text, and named entities that are further classified in people, organization and places. The core of the semantic enrichment process is Expert System Cogito software. Cogito uses a proprietary semantic network, where words are grouped into concepts with other words sharing the same meaning, and the concepts are related between them by linguistic relations such as hypernyms or hyponyms among many others. Therefore, the semantics of the generated metadata is explicit since the concepts are grounded to the semantic network.

Information retrieval processes, including search engines and recommender systems, can benefit of working with concepts instead of character strings representing words, mainly to provide a more complete and accurate set of results, and enabling the exploration of the research object collection by means of facets where the semantic metadata is available. The ROHUB search engine has been extended to index the research object metadata generated by the semantic enrichment process. Thus, the new search engine allows users to search for research objects where a concept is mentioned regardless of the word used to refer to it, or to pose advanced searches combining more than one type of metadata, for instance searching for research objects containing a given location and a concept. In addition, an aggregation tool is integrated in the search engine to ease the generation of compound research objects.

Similarly a recommender system has been designed leveraging the metadata generated by the enrichment process to find research objects with a content that match the user interests. The metadata is also used by the recommender system to elicit a user profile that is gathered after analysing all the metadata of the user’s research objects. The user interface of the recommender system is the Collaboration Spheres web application that integrates the social dimension of the research process, and eases the definition of the recommendation context

H2020 – EINFRA – 2015 – 1 Page 9 of 43

by adding to the user interests metadata drawn from specific research objects or other researchers. The Collaboration Spheres is fully integrated with EVER-EST authentication and can be launched from ROHUB.

On the other hand, tracking regular quality assessments of research objects is important for their preservation. It allows identifying decay on the quality of research object in the long term so that the preservation manager can take actions to restore the research object quality to an acceptable level . In deliverable D.4.3 a basic checklist was described and three specific checklist were presented for research objects mainly describing workflows, data and research products. These checklists were designed jointly with the virtual research communities and served as the basis to define a taxonomy of research object types. In addition, In this deliverables a new checklist type was designed and implemented for research objects centred around bibliographic information. In addition the taxonomy of research objects types was extended to include the bibliographic research object type.

Furthermore the ROHUB platform has been extended to support the assignation of DOIs to research objects. DOIs improve visibility of research objects since they are indexed by main publishers, and ensures proper credit to authors since their work is uniquely identified. In D4.3 the research object lifecycle was presented including a scenario where research objects could receive a DOI when released, or archived. In this deliverable the focus is on the technological support for DOI assignation.

1.1 Relation to other work packages

The main requirements for the research object components presented in this deliverable were elicited from the Virtual Research Environment use cases that were defined in D3.1. These use cases represented the different needs that the Virtual Research Communities have regarding the VRE and the use of research objects as a way to preserve, share and reuse their scientific results. Some of the components presented in this deliverable are used in the VRE and therefore there is a direct relation with WP5 where the VRE is being developed (see Figure 1). The components are integrated via application programming interfaces APIs that make available the functionalities to client software. More technical details of these components are provided in D5.8.

Figure 1. Work Package Dependencies

H2020 – EINFRA – 2015 – 1 Page 10 of 43

1.2 Compliance to the Smart Objectives and Key Performance Indicators

The research object components were developed as part of the planned work to achieve the objective 3 of the EVER-EST project, which is the implementation and validation of the use of research objects in Earth Science. Of particular relevance are the smart objective 3.2 that pursues the development of tools for research object preservation where quality decay is mentioned as an important feature, and the smart objective 3.3 that aims at stimulating sharing and reuse of research objects.

SM_OB#3.2 Development of means and technological support for research object preservation, including decay diagnosis and prevention both at the method and implementation levels.

Measured by Implementation of core research object management services, including storage, retrieval, lifecycle management, quality assessment and preservation. Ratio of stable research objects vs. decayed ones over time.

Achievable Availability of technological support for the management of research objects. This technology will be leveraged and customized to the case of research objects in ES.

Relevant Providing the appropriate interfaces for managing, sharing and preserving experiments as research objects is crucial for the adoption of this concept by the community, and to foster the reuse of experimental results. Providing measures for assessing the quality of research objects will further support the publication of high-quality scientific results.

Timely Technological support to be available at the end of the first year, ready for integration in the VRE. A second iteration will be released in M24.

Regarding objective 3.2 ROHUB, the available APIs and PSNC infrastructure hosting these services are the main services supporting the research object lifecycle and preservation in the long term. This deliverable describes the checklist developed jointly with the VRCs to assess the quality of research objects and the available tools to apply the checklists, visualize the results, and keep track of the quality assessment through time.

SM_OB#3.3 Stimulating sharing and reuse of research objects in Earth Science. Keeping track of the impact for prioritizing assignment of resources and observation time by data providers.

Measured by Number of research objects that have been found and reused, based on the recommendation functionalities provided by the system. Number of citations and acknowledgement of specific research objects as opposed to traditional publications in pdf format.

Achievable Visual metaphors for the recommendation of research objects based on user preferences, profiles, and similarity that are available are adapted to meet the needs of the Earth Science community.

Relevant Earth scientists will be enabled to find, reuse, and share relevant pieces of research investigations represented as research objects. This will increase the pace of scientific development in the area and minimize time and effort for subsequent work developed on top of existing one. The analysis of the impact generated will allow prioritizing

H2020 – EINFRA – 2015 – 1 Page 11 of 43

observation time and resources for generation and maintenance of observation data.

Timely Technological support to be available at the end of the first year, ready for integration in the VRE. A second iteration will be released in M24.

Regarding objective 3.3 this deliverable describes the tools that promote the sharing and reuse of research object by working on their findability, traceability. Findability is addressed by the semantic enrichment process, the semantic search engine and the recommender system. Traceability is achieved by using DOIs to identify released research objects. DOIs also enable to proper credit authors when their research objects are reused in other research works.

1.3 Overview of the document

This document is structured as follows. Section 2 presents the scenarios where the VRCs show how the use of the research object components helps them to carry out their research tasks. Section 3 describes the research object search and recommender components centred on the semantic enrichment of research objects. The enrichment process description includes the vocabulary developed to annotate the research objects with the generated metadata, the process designed to integrate the enrichment process in ROHUB, and the visualization of these annotations. In section 3 a new checklists for research objects containing mainly bibliographic information is designed and implemented. Finally section 4 describes the ROHUB technological support to assign DOIs to research objects according to the stage on the life cycle.

H2020 – EINFRA – 2015 – 1 Page 12 of 43

2 Use Scenarios of Research Object Components The VRCs have identified different scenarios where the use of the research object components contribute to the preservation, sharing and reuse of research outputs. These scenarios could be a good starting point for VRCs that are not currently part of EVER-EST to help them understand how other colleagues are using the technology in support of their research practice.

2.1 Sea Monitoring Scenario

A researcher needs to define the habitat extent of the Cold Water Coral in the Bari Canyon and to provide this information to assess the good environmental status related to the descriptor D1 (Biodiversity, Indicator Habitat extent) within the Marine Strategy Framework Directive for the Italian waters. To this scope the researcher needs an habitat suitability model for the Cold Water Corals. The researcher needs to search high resolution bathymetric data, Cold Water Coral occurrences data and to run a good model to obtain a reliable map of habitat suitability for Cold Water Corals. The researcher needs to release the results to colleagues from different institutions working at the Marine Strategy Framework Directive, to share the model with them, to reuse the model in different locations, and to re-run the model after one year using new data from the same location. For this scenario it is very important to share data and results within the community , to reuse the models coming from different scientists working at the same topic, to preserve the results and to publish methodologies and final maps.

2.1.1 Current limitations

Currently there is not a unique place where a scientist can find publication on this specific topic, workflow describing the models, links to the data to be used and results. There are no specific repositories that are used to preserve and reuse all this information. Generally, there is no information about the quality of the models and the methodologies applied described in the paper. Within the Marine Strategy Framework Directive there is a big lack of communication and all the relevant information is dispersed in different repositories.

2.1.2 Use and impact of research object components to overcome the limitations

The research object components allow the scientists to encapsulate data, running workflows, results and documentation and to effectively preserve and reuse all this information. The research objects are useful to share the complete scientific life cycle within the community and in this specific case to reuse the models in different locations and using different datasets. For the monitoring purpose the research object model gives the opportunity to exactly re-run the same model in the same location at different time giving the opportunity to really evaluate the differences in habitat extents applying the same methodologies. The components used in this scenario are:

● Semantic Search : To reuse an existing research object about habitat suitability models the researcher will pose a query with appropriated concepts (e.g. ecology, habitat suitability, habitat extent) in order to easily find the most suitable ones. Semantic search gives the opportunity to retrieve concepts from all documents in the research object and it is really effective to find the research objects containing data and information that you need for your work. Without the semantic search a free search could be performed, however it was difficult to find the effective concepts.

H2020 – EINFRA – 2015 – 1 Page 13 of 43

● Checklists: To have information about the research object quality, and to select the research objects that effectively work. It is important to reuse research objects with a running workflow and real link to data. The checklist is a good tool to evaluate a research object without losing time in verifying manually its content.

● DOIs: After reusing a workflow, with different data input and modifying the model parameters a new research object will be created with my data and the best suitable parameters in the models. This new research object will be released with a DOI that gives the opportunity to be properly cited by other scientist. The use of DOI encourages researchers to create more research objects with new data and new information. The DOI is a good way to encourage scientists to produce ROs. Without this possibility a scientist would not be encouraged to share his work and give the possibility to reuse all the models and workflow.

● Collaboration Spheres: Collaboration Spheres helps in a friendly way in exploring the RO collection about a specific theme or from a specific author with a free search. It is very useful to find all ROs around a specific main topic and choose the most suitable ones.

Benefits of using research objects are related to quickly share, reuse, preserve and release. If all components are ready you need few hours to prepare a research object and publish it while preparing the running workflow is more time consuming. On the other hand, Collaboration spheres is a good option to find research objects about a main theme. Nevertheless the application is not very precise in selecting the main topic from the author. Currently there are not enough research objects to establish how many of them are relevant. Actually the other recommendations in the Collaboration Spheres are not relevant, while the main recommendation are close to the chosen context.

2.2 Natural Hazard Protection Scenario

A member of the NHP Hazard Impact Modelling group may wish to re-run an existing modelling workflow for a new location/event. NHP partners develop Hazard Impact Models (HIM) for significant storm events to test the modelling process. Models seek to predict the impact of hazard events. Models are either developed using case study events – events that have already happened – to help test the success of a modelling process, or use forecast data that can then be tested against real events when/if they happen. Benefits of re-use/sharing will be using existing models that provide a close correlation with known events which can then be applied to events in the future and eventually in an operational environment.


Sharing information requires face to face meetings, regular communication via email, particularly to share scripts, data, algorithms. In a community that is spread across different locations and with different priorities, it is challenging to organize the necessary level of interaction in order to share and reuse effectively.

Currently, there is no way for anyone other than the model developer to search for existing models. Any search would require interaction with the modeling and delays might occur as a result of individuals working at different locations and for different organisations.

H2020 – EINFRA – 2015 – 1 Page 14 of 43


Providing a space in which NHP partners can search and reuse research objects with the confidence that they access quality algorithms and data that has been tested and verified will reduce the time it takes to share the information. It will enable NHP partners to browse different scenarios and run tests on their own data far more efficiently. To this end the components used in this scenario are:

● Semantic search – useful searches would include: ○ Date of event that model is based on (whether the event is a past event or a forecast event). ○ Event name/storm name – if one exists. ○ Hazard type. ○ Location – where was the impact greatest e.g. where did the flood occur and what assets did it

impact, or where did the landslide occur and what was the impact e.g. road closure? ○ Who lead or was involved in the research object development?

● Checklists ○ Will help a researcher understand the maturity of a model – has it reached the end of the

workflow? ○ A researcher will know if the research object can be reused straight away or if it needs further

discussion with the originator. ○ Will indicate the presence of metadata and therefore if there is additional useful information

attached, which in turn will support its reuse. ○ Has the research object been verified – is it fit for reuse? ○ Does the research object have an executable workflow attached? If not, it cannot be reused.

● Research object release (DOI) ○ Publication of research objects will allow the collated information on the model development

and/case studies to be referenced. A well-defined research object structure means that this can be done in a consistent way.

○ Daily release of research objects relating to daily hazard assessments will create a repository of hazard advice information that can be easily reviewed and referenced.

The use of each component will enable quick search on a variety of search criteria reducing the need for face to face/email interactions. It will enable NHP to work towards running models in an operational setting, something that is currently a long way off. It will improve consistency of model development by enabling reuse of workflows – this will reduce opportunity for operator error when setting parameters, variables etc. in the modelling workflow. The use of research objects for sharing and reuse of information will reduce the expense of travel time and staff resource involved in enabling sharing and reuse of workflows, models and algorithms. Release of research objects provides a single point of access to all of the information relating to the research object, reducing the time requirement associated with trawling across networks or contacts for information.

2.3 Land Monitoring Scenario

Based on the requirements collected to define its user scenario, the Land Monitoring community represented by SatCen expressed its interest for research objects applied to image processing (e.g. Change Detection) workflows on specific areas, defined by the users, within a pre-defined period. Such research objects, in a working scenario, will allow sharing, preserving and reusing the information contained in them. The envisioned use cases are:

H2020 – EINFRA – 2015 – 1 Page 15 of 43

● Share: from the Land Monitoring VRC portal, a user wants to apply a Change Detection algorithm on a

selected pair of Sentinel-1 images and a specific Area Of Interest. At the end of process, the user wants to

save the workflow in a research object (workflow-centric research object) and to share the information of

its process. Through the button “Create experiment” on the VRC portal, the user creates the research

object. The list of information required to create the research object is described in the Checklist section.

● Preserve: from the Land Monitoring VRC portal, a user wants to know if there is any information related to

changes on a specific Area of Interest. If a research object has been previously created (as workflow-

centric research object) the user, after a semantic search, downloads the research object with the related

information contained.

● Reuse: from the Land Monitoring VRC portal, a user wants to know if there is any information related to

changes on a specific area of interest. If a research object has been previously created (as workflow

research object) the user, after a semantic search, reuses the workflow on a different time range,

updating the information on that Area Of Interest.


Currently, the results of the image processing workflows are delivered as output maps, without containing

ancillary information and metadata (which can be related to the original data, to the applied processes and/or to

the generated results).


The use of workflow-centric research objects in a working scenario will tackle the identified limitations, allowing to

share the whole process and related metadata, to preserve the associated information of the workflow and to

possibly reuse the workflow on a different time frame.

● Checklists: Once the user has pushed the button “Create experiment” on the Land Monitoring VRC portal,

the following metadata are required to create the workflow research object:

○ Title, to be filled.

○ Description, to be filled (a name from the Area of Interest should be inserted to facilitate its

search).

○ Processing Date, automatically inserted.

○ Confidentiality (Public/Restricted), check box or dropdown menu.

○ Owner/Creator, automatic filled from the user profile.

○ Data name (e.g. Sentinel-1 product ID), automatically filled from the metadata of original images.

○ Analysis period, automatically filled with the data from the metadata of original images.

○ Output, georeferenced results image attached (e.g. PNG format).

● Semantic Search: The Semantic Search allows finding a research object through a keywords and the

facets. The name and the description of the RO (defined when the RO is created) should guarantee the

H2020 – EINFRA – 2015 – 1 Page 16 of 43

research object discovery. For the Land Monitoring user scenario, keywords as the following are of

relevance:

○ Earth Observation, Remote Sensing, Satellite and Security would define the Domain.

○ Land, Monitoring, Product, Processing, SAR, Sentinel-1, Change Detection and GRD would define

the Concepts and Expressions.

○ Username of the RO creator would define the People.

○ Location of the study area would define the Places.

○ Organization of the RO creator would define the Organizations.

● Collaboration Spheres: The Collaboration Spheres returns a list of correlated research objects and helps

to visualize research objects, based on their similarity. For instance, a research object containing concepts

like “Land”, “SAR” and “Change Detection” could help the users to find similar research objects created on

the same topics, to preserve the contained information in a workflow and to reuse them.

The current image processing procedures (e.g. Change Detection with Sentinel data) are based on the execution of workflows: the ancillary information is not shared nor preserved, and the same processing chain cannot be reproduced following the same criteria, unless all the metadata are noted manually. The use of ROs (and the related components) will allow to share, to preserve and to reuse image processing workflows and all the information associated to it (e.g. input images, output map and metadata) in a single structure, improving the present practices to execute image processing chain.

2.4 Supersites Scenario

A new eruptive cycle is taking place at Mt Etna Supersite. Satellite data (InSAR) are constantly processed as soon as new images are available. A researcher wants to model the magma source by means of the VSM (Volcano Source Model) tool, in order to define the location and volume of the magma reaching the surface. The researcher needs to search the available documentation about the current eruption, e.g., daily and weekly reports containing the summary of monitoring data and their analysis. The researcher may want to go back to examine the modeling parameters/methods/results for previous eruptions. The researcher uses the available documentation for past eruptions and the present eruption data, to develop a modeling scenario, then he/she runs the modeling process. He/she wants to share the modeling workflow with other colleagues for cross-checking of the procedure and to facilitate model updates during the progress of the eruption. He/she wants to be attributed the authorship of the model results even before publications, since the results are shared with the scientific community working for the emergency. The scientific community constantly analyzes and compares the various results and generates reports using a collaborative consensus process. The consensus products (e.g. maps showing where magma is accumulating and its depth) are operationally shared with end-users (e.g. civil protection) and made public.


At present there is no unique virtual or physical place where all (or most of) the information (reports, data, results, publications) about a single event on each Supersite (as an eruption) can be found. Reports, publications and data may be accessed in online repositories but in a fragmented way, often not according to FAIR principles, and with

H2020 – EINFRA – 2015 – 1 Page 17 of 43

no guarantee of completeness (e.g. some information may be lacking or it is provided only in printed, non-reusable format).

Collaboration among the investigators is hindered by the difficult access to the information, data, and results and this may reduce the effectiveness of the entire process, affecting the quality and the timeliness of important information products for the emergency management.

The practical difficulties in providing access to information can also be enhanced by the resistance of researchers in sharing their data or results before scientific publication, unless a mechanism for proper IPR protection is in place.

In the Supersites community there are no specific repositories where all the information embedded in a scientific workflow (e.g. including not only the input data and output results, but also the modeling parameters, the results and their quality/meaning, etc.) can be stored, preserved and made available for reuse in a digital environment.


The RO paradigm and its versatility to encapsulate data, collect documentations, store workflows and results may apply to all the steps described in the above scenario. ROs can be used to collect and make rapidly searchable and available all the documents and data concerning the current and past eruptions, useful to guide the data processing and analysis, and the interpretation of the ongoing eruption. The RO recommendation engine is one important advantage during the investigation process, since it can effectively support the scientist in finding information which may not be otherwise evident.

Workflow ROs (e.g. the VSM RO instance) can be used to document the data analysis process of single scientists or teams within the community. By sharing these ROs and re-using them in a proper environment, the scientists can reproduce the same results, modify some parameters, add further knowledge, and thus eventually improve the investigation result.

Given the capacity to provide proper authorship attribution (and also licensing) to the original content of these ROs, the researchers are more willing to share their research results in digital form, which enhances re-use and citation, but also collaboration and cross fertilization.

The components used in this scenario are: ● Semantic Search : The semantic search is useful to find an existing RO related to the person and/or the

topic of interest, even when the links are not evident. For the Supersites scenario the researcher could use the following semantic terms: Mt Etna, 2017, eruption, any other specific past eruption, 'source modelling', InSAR, the names of colleagues working in the same field, etc. The researcher can easily find and access the information for re-use.

● Checklists: Are fundamental to allow long term preservation and good quality of ROs, and so build the trust in the scientific community on their use. A RO whose checklist is not verified may lose most of its value, and is like a webpage with broken links, implying lack of completeness and turning away potential users. For example, workflow RO with model results should clearly refer to the data sources and the analysis code, in oder to permit easy re-use.

● DOIs: The authorship attribution for data, model results, or any kind of scientific effort/product is of primary importance in Earth Science and in particular for the Supersites community, where open access is mandated by the GEO framework. All ROs containing original work will obtain a DOI, which allows the author(s) to be cited and their work properly acknowledged. This stimulated the sharing of results even

H2020 – EINFRA – 2015 – 1 Page 18 of 43

before the final publication. For instance during the Etna eruption the VSM modeling workflow can be composed of several snapshot ROs each documenting an important step forward in the model development (e.g. when new data become available). Although the scientist may want to investigate all possible datasets before scientific publication, each of these snapshot may provide important information for decision makers, and should be released. The DOI attribution to RO supports this behavior.

● Collaboration Spheres: The Collaboration Spheres component allows researchers to explore the existing RO space to rapidly and easily find interesting information (documentation, data, results, workflows, etc.). Through an easy-to-use graphical interface the researcher sets the main topic (or person) of interest (e.g. a volcano source modeling RO) and the system recommends all ROs which are at least in part related to the topic, to the person, or both. As the size of the RO collection increases, the Collaboration Spheres component may become a very useful tool to select the most likely interesting RO for a given scientific investigation.

The use of RO has many potential benefits for the Supersite Community. How rapidly the acceptance of this new method for information (and knowledge) exchange will occur has to be verified in the practice, since it will require a change in the working habits of scientists.

H2020 – EINFRA – 2015 – 1 Page 19 of 43

3 Components for Search and Recommendation of Research Objects The two main mechanisms to retrieve and explore the research object collection are the search engine and the recommender system. Both systems leverage the research object metadata that is supplied by authors at creation time such as title, description, and user annotations. However, not all the research objects have descriptive titles, descriptions, or annotations about the content of the research object and some of them lack this metadata completely. This lack of descriptive information hampers the retrieval process since user query terms are matched with the scarce or non existing metadata, and hence relevant research objects might not have been included in the search or recommender results. To cope with this lack of descriptive metadata or low description coverage of the research object content a semantic enrichment of the research object content was proposed in D4.3 (section 3). The goal of the semantic enrichment is to automatically elicit semantic metadata that can be used in the search and recommender processes as a complement to the user generated metadata. The retrieval mechanisms relies on the metadata to match the user query terms in the search system or the terms in the user context in the recommender system.

3.1 Semantic enrichment of research objects

The goal of the semantic enrichment process is to generate structured metadata from the research object textual content. For example, the research object “LM Calibration” is a good candidate to show the benefits of the semantic enrichment since it is poorly described regarding its content (see Figure 2, left hand side), and therefore its chances to be found and reused by other researchers are low. The semantic enrichment process is able to generate from the aggregated resources a rich metadata set (see Figure 2, right hand side) that helps to understand the research object content. If this metadata is exposed to search engines and recommender systems the research object findability is clearly increased. In section 2.1.1 each metadata category generated by the semantic enrichment is explained in detail.

Although the semantic enrichment core process was presented D4.3 in (section 3.1.3.1), it is described here briefly to make the section self contained. In addition, the process description is extended with new features not mentioned in D4.3 such as the annotations of research objects with the generated metadata, the vocabulary developed to model the new annotations, and the integration of the semantic enrichment process in ROHUB platform.

The enrichment process starts by browsing the research object content and selecting the aggregated resources from which text is extracted according to the resource type and the resource file type . Resource types of interest are Document (wf4ever:Document), BibliographicResource (dcterms:BibliographicResource), Conclusions (roterms:Conclusions), Hypothesis (roterms:Hypothesis), ResearchQuestion (roterms:ResearchQuestion), and Paper (roterrms:Paper). The files associated with these resources must be of any of the following types: Word documents, PDF documents, Text files, or PowerPoint Presentations. All the resources that pass the previous filters are processed. Once the resource files are filtered their textual content is extracted with the help of open source tools such as apache PDFBOX1 to process PDF documents and apache POI2 to process Word documents and Powerpoint presentations. All the pieces of text extracted from the these resources are fed into COGITO to generate the metadata representing the research object content.

1 https://pdfbox.apache.org/ 2 https://poi.apache.org/

http://purl.org/wf4ever/wf4ever#Document

http://purl.org/dc/terms/BibliographicResource

http://purl.org/wf4ever/roterms#Conclusions

http://purl.org/wf4ever/roterms#Hypothesis

http://purl.org/wf4ever/roterms#ResearchQuestion

http://purl.org/wf4ever/roterms#Paper

https://pdfbox.apache.org/

https://poi.apache.org/

H2020 – EINFRA – 2015 – 1 Page 20 of 43

Figure 2. Example of semantic enrichment

3.1.1 COGITO

COGITO is Expert System’s patented semantic platform for text processing. The main components of COGITO are the sensigrafo and the disambiguator. The sensigrafo is a semantic network where knowledge is represented as graph of concepts and relationships between them. In the sensigrafo a concept is a group of words (or synonyms) that are used with the same meaning, and the relations between concepts include relations such as supernomen for more general terms or subnomen for more specific ones. Words in the sensigrafo are actually lemmas. A lemma is the root or canonical form of a set of words. The English standard Sensigrafo contains about 350,000 syncons, more than 350,000 lemmas, rules for inflections, 80+ relation types that yield about 2.8 million links between concepts.

The disambiguator, on the other hand, is a multi-level linguistic engine able to disambiguate the meaning of a word by recognizing the context of occurrence of that word. The disambiguator uses different levels of analysis such as lexical, grammatical, syntactical and semantic analysis. In the lexical analysis the text is split into tokens. In the grammatical analysis every token or group of token is assigned a part of speech tag such as noun, proper noun, verb, adjective, article and so on. In the syntactical analysis the tokens are arranged in a syntactical structure that mirrors the way in which words are linked to each other to form phrases (e.g., noun phrases or verb phrases), phrases are linked to each other to form clauses, and clauses are linked to each other to form sentences. The main output of the syntactical analysis is a logical role to each phrase (subject, object, verb, complement, etc.). Finally, in the semantic analysis tokens are associated with meaning as defined by concepts in the sensigrafo. The most appropriate concept for a given token is selected according to the grammatical and syntactical characteristics of the token, and the relation with the syntactical elements surrounding it (the context).

H2020 – EINFRA – 2015 – 1 Page 21 of 43

In addition COGITO has a Named Entity Recognition module that aims at spotting names referenced by contiguous spans of tokens. These names are categorized in names of People, Organizations, and Locations. COGITO is able to recognize expression of times, quantities, monetary values, and percentages.

The text extracted from the research object resources is processed by COGITO which generates the following metadata:

● Main Concepts: Most frequent sensigrafo concepts mentioned in the text. ● Main Domains: Fields of knowledge in which the main concepts are most commonly used. ● Main Lemmas: Most frequent lemmas found in the text. ● Main Compound Terms: Most relevant phrases or collocations found in the text . ● Named entities: all the named entities found in the text classified into People, Organizations and Places.

3.1.2 Research object annotations

The structured metadata generated by COGITO is added to the research object using annotations. Recall that the research object model itself strongly depends on annotations to provide information about the aggregated documents. These annotations are defined accordingly to the annotation ontology3 which is the reference model to annotate resources in the research object model.

Figure 3. Annotation Data Model [1]

The annotation ontology establishes that an annotation has a target and a body (see Figure 3) . The target is the research object and the body contains the actual annotations, i.e., the structured metadata. ROHUB, or more precisely RODL API, simplifies the addition of annotations to research objects, by requesting the annotation body and the research object to be used as target. The annotation body may contain plain metadata in the form of text strings or semantic annotations following a shared vocabulary. For the semantic enrichment annotations the latter option was preferred since it increases the reusability and interoperability of the annotations. Therefore an ontology for content description was developed including classes for each of the distinct types of metadata generated by COGITO.

3.1.3 The ontology for content description

The main classes of the Ontology for Content Description ContentDesc correspond to the type of semantic metadata that is generated from each research object, which are put together in a knowledge system SKOS4. This

3 https://code.google.com/archive/p/annotation-ontology/ 4 https://www.w3.org/TR/skos-reference/

https://www.w3.org/TR/skos-reference/

H2020 – EINFRA – 2015 – 1 Page 22 of 43

ontology reuses well known vocabularies, which eases the consumption of the annotations by software agents outside of ROHUB boundaries. The vocabulary is published in https://w3id.org/contentdesc and it provides content negotiation for the most common RDF serializations (rdf+xml, text/turtle, and text/n3) and HTML visualization. A preview of the html documentation of the ontology is shown in Figure 4.

The main classes in the vocabulary, depicted in Figure 5, are Concept, Domain, Expression that corresponds to compound terms in COGITO terminology, and Named Entities. Named entities specializations includes Person, Organization and Place. These classes are modelled as skos:Concept5 and are organized in a skos:ConceptScheme6 where Domain, Expressions, and Named Entities are top concepts of the scheme. The classes Person7 and Organization specialize the classes foaf:Person foaf:Organization8 respectively, while the class Place specializes the class dcterms:Location9. The skos:prefLabel property can be used to define the label of instances of Domain, Concept, Expression, and Places. While the foaf:Name property can be used to define the label of instances of Person and Organization.

Figure 4. Preview of contentdesc ontology documentation

5 https://www.w3.org/TR/skos-reference/#concepts

6 https://www.w3.org/TR/skos-reference/#schemes

7 http://xmlns.com/foaf/spec/#term_Person

8 http://xmlns.com/foaf/spec/#term_Organization

9 http://dublincore.org/documents/dcmi-terms/#terms-Location

https://w3id.org/contentdesc

https://www.w3.org/TR/skos-reference/%23concepts

https://www.w3.org/TR/skos-reference/%23schemes

http://xmlns.com/foaf/spec/%23term_Person

http://xmlns.com/foaf/spec/%23term_Organization

http://dublincore.org/documents/dcmi-terms/%23terms-Location

H2020 – EINFRA – 2015 – 1 Page 23 of 43

Figure 5. Main concepts of the ContentDesc ontology

3.1.4 Example of semantic enrichment using annotations

The research object entitled Land Monitoring Change Detecting Step10 is used to exemplify the semantic enrichment process. This research object includes a text document describing the hypotheses and conclusions as well as a pdf document. After processing the research object textual content the semantic metadata is added as a new annotation to the research object. The following turtle code excerpt shows the new annotation “f334” that is added to the research object.

@base: <http://sandbox.rohub.org/rodl/ROs/> .

@prefix ore: <http://www.openarchives.org/ore/terms/> .

@prefix ao: <http://purl.org/ao/> .

@prefix ro: <http://purl.org/wf4ever/ro> .

<LandMonitoring_Change_Detecting>

a ro:ResearchObject ;

ore:aggregates

<LandMonitoring_Change_Detecting/.ro/annotations/f334> ;

<LandMonitoring_Change_Detecting/.ro/annotations/f334>

a ao:Annotation ;

ao:body <LandMonitoring_Change_Detecting/f334.rdf> .

ao:target <LandMonitoring_Change_Detecting>

10 http://sandbox.rohub.org/rodl/ROs/LandMonitoring_Change_Detecting/

H2020 – EINFRA – 2015 – 1 Page 24 of 43

The semantic metadata is saved within the body of the annotation. An excerpt of the annotation body file is shown below. Two instances of each semantic metadata types are included. The metadata states explicitly that the research object content mainly refers to the concepts monitoring, and segmentation and reassembly which fit in the geology and graphic domain, and two of the most frequent expressions are “exploitation of the image archive” and “image processing algorithm”.

@base: <http://sandbox.rohub.org/rodl/ROs/LandMonitoring_Change_Detecting> .

@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

@prefix dc: <http://purl.org/dc/terms/> .

<.../ROs/LandMonitoring_Change_Detecting>

dc:subject <subject/-1302006390>,<subject/280343272> ,

<subject/-734754489>, <subject/-1557562560>,

<subject/1852089416>, <subject/-79018874> .

<subject/-1557562560>

a "https://w3id.org/contentdesc#Concept" ;

skos:prefLabel "Segmentation and Reassembly" .

<subject/1852089416> a "https://w3id.org/contentdesc#Concept" ;

skos:prefLabel "monitoring" .

<subject/-79018874> a "https://w3id.org/contentdesc#Domain" ;

skos:prefLabel "geology" .

<subject/280343272> a "https://w3id.org/contentdesc#Domain" ;

skos:prefLabel "graphic" .

<subject/-734754489> a "https://w3id.org/contentdesc#Expression" ;

skos:prefLabel "image processing algorithm" .

<subject/-1302006390>

a "https://w3id.org/contentdesc#Expression" ;

skos:prefLabel "exploitation of image archive" .

3.1.5 Semantic enrichment integration in ROHUB

The semantic enrichment process is seamlessly integrated in ROHUB so that users or the platform can trigger the enrichment process for research objects. The integration of the semantic enrichment process was designed according to the callback asynchronous invocation model. This model is used when the processing may be long, which is the case of the semantic enrichment given its complexity, and the client side (ROHUB user, or the

H2020 – EINFRA – 2015 – 1 Page 25 of 43

platform itself) does not want to get blocked until a response is sent back. The general idea is that the client system sends along the enrichment request a callback method that is called by the server side to send the response to the client system.

The sequence diagram in Figure 6 shows the enrichment process. It begins when the user requests in ROHUB the semantic annotation of a research object. This request is handled by the Research Object Digital Library which transforms it into a POST request to the RO SemanticEnricher Web Service WS. The SemanticEnricher WS enqueues the enrichment request in a Request Queue and returns an HTTP status code 202 code indicating that the enrichment request is accepted. The process called semantic enrichment continuously monitors the Request Queue and processes the enrichment requests, generating the semantic metadata for the research object and saving it in the Response queue. The process Response Handler monitors the Response queue and sends the semantic metadata to ROHUB using the callback method specified in the original enrichment request. This architecture is robust since messages are always guaranteed to be delivered and scale very easily since queues support concurrent access and therefore the queue workers (readers and writers) can be parallelized.

Figure 6. Semantic search integration in ROHUB based on an asynchronous web service architecture

H2020 – EINFRA – 2015 – 1 Page 26 of 43

3.1.6 Semantic annotations in ROHUB

The semantic enrichment service is invoked by ROHUB automatically, but it can also be triggered on demand. Regarding the former case, ROHUB executes periodically (once per day at 2am) a process that identifies the research objects created since the last execution, and for each of them, a call to the semantic annotation is triggered. The request payload includes the ID of the research object and the callback uri, among others (as depicted in Figure 4). The callback uri is then used by the semantic enrichment service to return the results to ROHUB after finishing the enrichment process.

Additionally, ROHUB enables end-users to trigger the semantic enrichment service directly from the ROHUB portal. In the research object overview page, there is a button called “Update content annotations” (see Figure 7). Pressing this button will trigger the execution of the semantic enrichment service for the selected research object. This functionality is quite useful when the user is working with an existing research object and would like to see immediately the new annotations found by the semantic enrichment service after making changes, without needing to wait until the next time the automatic process executes.

Regardless of how the semantic enrichment service was invoked, the results of the service are sent to ROHUB in the form of an RDF graph. When the ROHUB callback process receives this graph, it creates and add an annotation to the research object, in which the graph becomes the annotation body. Additionally, the index server (solr) is updated with the research object annotations, so that they can be used for searching (i.e., using the keyword-based search and faceted search components). In the case the research object is updated, the existing annotation body is replaced by the new RDF graph (and solr is updated).

In the current version of ROHUB portal, the annotations from the semantic enrichment service can be displayed by clicking the “show annotations” button from the overview tab (see Figure 8). After pressing this button, ROHUB displays all the research object annotations in the advanced annotation view panel in the form property-value, including those from the semantic enrichment service (that uses Dublin Core “subject” property). In the next release these annotations will be more visible to the user.

H2020 – EINFRA – 2015 – 1 Page 27 of 43

Figure 7. Research object overview

H2020 – EINFRA – 2015 – 1 Page 28 of 43

Figure 8. Advanced annotations view panel

3.2 Semantically-enhanced search of research objects

The previous version of the ROHUB search engine was based on keywords and facets that were generated from the user-generated metadata and statistics about the research object content including number of aggregated documents, annotations, comments and citations. Despite their popularity keyword-based search engines may suffer from low precision and recall mainly due to their lack of semantics. That is, keywords are treated as simple strings of characters that do not have associated any meaning. Therefore these search engines, based solely on keywords, do not distinguish the different meaning of ambiguous words or when different word refers to the same meaning (synonyms). A more in depth discussion about the drawbacks of keyword based search system was presented in D4.3 section 3.1.3.

Facets provide a step forward, allowing the user to filter (search) by selecting specific values in some property(ies) (representing the facets) related to the research object (e.g., creator, creation date, number of resources, etc.). In this line, as part of the extensions of the RO model in EVER-EST, new properties have been identified that are relevant for the research object and/or the aggregated resources, extending the set of facets that the user can choose from. Some of these properties have values linked to a structured knowledge in the form of reference vocabulary (or ontology), such as research area, type of research object and state of the research object. This allows having some meaning associated with the property values, as well making semantic inferences (e.g., a research object that with research area astronomy, is also about space science).

An even further step is provided by the semantic enrichment service presented in Section 3.1.5, which can parse and extract named entities and concepts from the research object content, generating automatic annotations with values associated to a semantic vocabulary. As a result, the discovery of research objects is highly improved, as there is no need to rely only on annotations (metadata) provided by the end-user to find relevant research objects.

As described in the previous section the new ROHUB has been integrated with the semantic enrichment service, allowing for an enhanced internal search engine (based on solr) and the corresponding interfaces (keyword-based

H2020 – EINFRA – 2015 – 1 Page 29 of 43

search (see Figure 9) and faceted search (see Figure 10). In particular, these two search interfaces are now enhanced as they both rely on the annotations provided by the enrichment service in addition to the user generated annotations. The structured metadata added by the semantic enrichment service has been renamed in the facets as follow: Domain to Discipline, Concept to Topic, Expression to Frequent expression, and Person to Person name.

Figure 9. Keyword-based search in main ROHUB page (enhanced with automatic annotations)

Figure 10. Faceted search in ROHUB (enhanced with automatic annotations)

H2020 – EINFRA – 2015 – 1 Page 30 of 43

3.3 Recommender system

The recommender system suggests research objects that might be of interest to researchers according to the context that defines their research interests. The recommender approach is mainly content-based and takes into account the social dimension of the researcher. That is, the content of research objects, which is represented by the structured metadata generated with the help of the semantic enrichment process, is leveraged when selecting those that are included in the recommendation. The social dimension is exploited when defining the recommendation content and includes co-authors and researchers that share common interest.

The recommender system uses two complementary approaches to process research objects in terms of their content: explicit semantics and word embeddings. The explicit semantics approach refers to the fact that the structured metadata is grounded to concepts in the sensigrafo, and therefore the metadata semantics are explicitly defined in the semantic network. That is, meaning of terms in the metadata is unambiguously defined by a concept in the semantic network, and therefore all the linguistic information presented in the sensigrafo is available to the recommender system, e.g., synonyms, and broader and narrower terms. The recommender benefits of this semantic information when comparing two research objects in terms of the concepts mentioned in them instead of only using plain words.

However, sometimes a term does not have a corresponding concept in the semantic network, for instance when it is a new term emerging in a very dynamic environment, or when the sensigrafo does not cover in detail all the terminology of a very specific field. Is in this case where Word Embeddings are useful since they allow representing the meaning of words without a semantic network. Word embeddings aim at generating numerical representations for words in a multidimensional space in the hope that words with a similar meaning are located in the same part of the multidimensional space. These numerical representations take the form of vectors that are learned automatically with a neural network from a corpus, which in this case is the research object collection. Using word embeddings the recommender system can compare two research objects by using the cosine function as an indicator of similarity. For a more complete description of the word embeddings the reader is referred to D4.3.

3.3.1 Collaboration Spheres

The Collaboration Spheres CS web application, deployed at http://everest.expertsystemlab.com/spheres, is the user interface of the recommender system. The user interface, shown in Figure 11, consists of a navigation panel on the left-hand side, the spheres on the right hand side, the authentication box on the upper right corner and the help option just below this box. Broadly the navigation panel is where the user search for the research objects or users to be included in the recommendation context. The spheres component serves as a container for the recommendation context and the recommender results. The authentication box displays information of the logged user and the help button triggers a demo of Collaboration Spheres.

3.3.1.1 Authentication

The CS are fully integrated with EVER-EST authentication service. The first time the user accesses the web application the user is prompted to sign in using a login and password or to sign up as new EVER-EST user. Once the user has successfully signed in he is asked to grant access to his profile data to the Collaboration Spheres, and then the web application is displayed. Note that if the user has signed in to another EVER-EST application, such as the VRC portals, he will not be required to sign in again when accessing the CS since the authentication service is

http://everest.expertsystemlab.com/spheres

H2020 – EINFRA – 2015 – 1 Page 31 of 43

aware that the user has an active session. The authentication box allows verifying the user that is currently signed in or to sign in as a different user.

3.3.1.2 Navigation panel

The navigation panel main purpose is to make available to the user the research objects and the users that he can use as context of the recommendation. Both users and research objects are divided in a set of categories that are based on the social dimension of the research work. The left column contains the research objects split into three categories:

● My research objects: User’s research objects. ● Collaborations: Research objects of other authors who have co-authored a research object with the user. ● All Research Objects: All available research objects in the collection.

The right hand column contains the users that have created at least one research object. These users are also categorized in three categories:

● Collaborators: User’s co-authors. ● Related : Users that have research objects that are similar to the user’s research objects. ● All scientist: Users registered in the platform.

Figure 11. General overview of the Collaboration Spheres

H2020 – EINFRA – 2015 – 1 Page 32 of 43

Figure 12. Research object summarization card.

Note that the related group of users is based on the notion of similar research objects and this similarity is calculated based on the structured metadata added by the semantic enrichment process. Both columns have a search functionality that allows to filter research objects by their title and users by their name.

When a user clicks a research object in the navigation panel, a research object summarization card is displayed (see Figure 12). This card displays in the upper part the title of the research object, or in case it does not exists the research object url, the creation date and the author name. The research object name is a link to the research object information in ROHUB. In addition, the card includes some of the metadata generated by the semantic enrichment process: Main topics refers to the more important concepts found in the research object content, and the areas of knowledge are the domains in which these concepts are more frequently used. The research object description is displayed if its is available.

Figure 13. User summarization card.

There is also a researcher card, see Figure 13, that is displayed when the user clicks on another user. This card displays the user name, or in case it is not available the user url. In addition the cards provides information regarding the user’s research work: Main Topics are the most frequent concepts found the in user’s research

H2020 – EINFRA – 2015 – 1 Page 33 of 43

objects, Areas of knowledge are the domains in which these concepts are frequently used, and research objects provides a partial lists of the user’s research objects. Once the user has selected the research object or the user that he wants to include in the recommendation context he can just simply drag it to the context sphere that is explained next.

3.3.1.3 Spheres

The sphere component, depicted in Figure 14, consists of four concentric spheres where the context of the recommendation is defined and the results of the recommender are displayed. The smallest sphere represents the user that is currently signed in the system. The user research interests are the first to be included in the recommender context. These research interests are derived from the analysis of all the user’s research objects. All the concepts found in the research objects are aggregated and the most frequent concepts are kept as an indicator of his interests.

Figure 14. Sphere components

The second sphere, from the inside out, is where the context is modified. The user can drag and drop up to three research objects, users, or a combination of both from the navigation panel. Each time a research object or a user is dragged to this sphere or deleted the recommender results are updated instantly. The recommender context is extended by adding the most frequent concepts of each research object in this sphere, and the most frequent concepts elicited from the aggregation of all the user’s research objects of each users in this sphere.

The third, and fourth spheres display the recommender results. These are research objects that are semantically related to concepts which are included in the context. The recommender system assigns a score to each research object that indicates how strong is this semantic relations. The higher the score the more semantically related is the research object to the context. The third sphere displays the more related research objects and the fourth the rest.

H2020 – EINFRA – 2015 – 1 Page 34 of 43

3.3.2 Collaboration Spheres in ROHUB

In the first release of the ROHUB, the Collaboration Spheres can be launched from the header menu in the discover tab (see Figure 15). This link opens the Collaboration spheres prototype page. Hence, at the moment they are only loosely integrated in ROHUB. Note that when the Collaboration Spheres are launched, the user has first to log in with his EVER-EST credentials. The single sign-on deployed in EVER-EST enables a seamless transition from ROHUB to the Collaboration Spheres if the user is already authenticated in ROHUB as an EVER-EST user. For the next release, the aim is to have them more tightly integrated within ROHUB, e.g., integrating the web components of the Collaboration Spheres.

Figure 15. ROHUB header

3.4 Research Object Aggregation Tool

The motivation to generate an aggregation tool of research objects comes from the VRCs that wanted to group bibliographic resources according to event dates or topics. For instance, a research object grouping all the bibliographic research objects produced in 2015 referring to the Earthquakes in Mount Etna. The aggregation tool allows researchers to create a new research object with the content of other existing research objects or to stablish relations with these research objects without copying their content.

Figure 16. Aggregation tool integrated in the semantic search engine

The tool was designed to benefit from the filtering capabilities of the semantic search engine and hence it is integrated in the search engine user interface. Figure 16 depicts the new button added in the search engine “CREATE AGGREGATED RO”. The typical usage scenario would be a user that first pose a query in the search engine that includes, as a subset, the research objects he wants to aggregate and then starts the aggregation tool.

H2020 – EINFRA – 2015 – 1 Page 35 of 43

The aggregation tool asks for a title and a description for the new research objects, and offers the possibility to filter the research objects that were the results of the user search. The tool also asks users if they want to copy the content of the research objects in to the new one or if they want to stablish a relation with these research objects without copying the content. Figure 17 shows the user interface presented to the user after clicking the “CREATE AGGREGATED RO” button in the search engine.

Figure 17. Research object aggregation tool

H2020 – EINFRA – 2015 – 1 Page 36 of 43

4 Components for Preservation and Curation of Research Objects Checklists are the main tool to asses the quality of research objects through their life cycle. These checklists are made up of statements that specify the required features a research object must contain. They serve as the basis to calculate quantitative metrics, such as completeness and stability, that are relevant to monitor the research object quality in time periods spanning days, months or years. Thus checklists helps to ensure the quality of research objects when they are preserved in the long term.

In EVER-EST the virtual research communities have been involved in the design of new checklists that are adapted to the needs of earth science disciplines. In D4.3 the design of four different checklist were described for research objects that were potentially identified with a DOI. These checklists correspond to distinct types of research objects that were identified according to their expected content: Basic, Workflow, Data Product and Research Product. The checklists are already available in ROHUB so that users can check the quality of their research objects against the required features specified in each checklist.

4.1 Checklist for bibliographic research objects

In addition to the basic, workflow, data and research product checklists some VRCs identified a Bibliographic checklist to assess the quality of research objects containing mainly bibliographic information. This was the case for CNR-ISMAR and INGV. The research object containing bibliographic information was described by INGV as “a research object containing references to a specific topic, e.g., a geographical case study, a method, a project. It contains no workflows. It may contain references to published papers or any material having a DOI, or anyhow indexed. It may contain only search queries of interest, to be updated periodically to monitor new publications.” The purpose of this kind of research objects is to help users to exchange information with other scientists, or to support systematic collection of publication about a specific topic, e.g., the 2013 eruption of Mt Etna or the 2011 to 2013 unrest at Campi Flegrei. In terms of the research object model a bibliographic research object should contain at least a resource of type “Bibliographic Resource”.

The proposed features to check for bibliographic research object are summarized in Table 1. In this table a * indicates features which are typically managed automatically by the RO management service, such as the ROHUB backend service, and therefore they should not be tested in a checklist, a ^ indicates features out of the scope of the canonical checklist, i.e., features that are too specific for one of the VRC, not generic enough or not possible to check with a checklist.

The checklist requires the existence of a title, a description, a creator, a copyright holder, a purpose, and access level, and of course it demands a resource of type Bibliographic resource. The affiliation feature was discarded to be checked since in D4.3 it was considered out of the scope of the canonical checklists. Similarly, the creation and modification dates, the research object type, the status and the source research object were discarded since they research object library is in charge of managing that metadata.

The associated checklist has been created and made available at: https://github.com/wf4ever/ro/tree/earth-science/checklists

https://github.com/wf4ever/ro/tree/earth-science/checklists

https://github.com/wf4ever/ro/tree/earth-science/checklists

H2020 – EINFRA – 2015 – 1 Page 37 of 43

Table 1. Checklist for bibliographic research objects

CNR-ISMAR INGV Feature to Check

Title Title hasTitle

Description Description hasDescription

Owner/Creator Authors hasCreator

^Affiliations * Created on

* Last modified on

* Research object type * Status * Status

* Source Research Object

Copyright Owner hasCopyrightHolder

Purpose/Use of Data hasPurpose Papers, reports, any type of publication hasBibliographicResource

Queries on search engines hasBibliographicResource

Access level hasAccessLevel Starting date hasGeneratedAtTime

End date hasInvalidatedAtTme

4.2 EVER-EST research object types taxonomy

In the previous version of this deliverable D4.3 an initial taxonomy arranging the different types of research objects that were elicited from the checklist analysis was presented. In this version the taxonomy has been extended to include the bibliographic research object type (See Figure 18). As reminder the taxonomy has the characteristic that a research object of a particular type can be validated against the corresponding checklist specified for its type or against any of the checklists of its parent types. The bibliographic research object was added under the basic research object at the same level of the process and data research object. According to the analysis carried out the bibliographic research object does not share any required feature with the process and data research objects.

H2020 – EINFRA – 2015 – 1 Page 38 of 43

Figure 18. Research Object type taxonomy for earth observation

4.3 Technological support in ROHUB

As mentioned in D4.3 ROHUB provides tools to evaluate the research object quality by manually selecting the checklist type in the quality tab that is going to be tested on the research object. ROHUB then displays a bulleted list containing the predicate tested in the checklist. Each predicate is marked with a green tick for those which passed the test and a red x mark for those that failed the test. In the new ROHUB portal developed in EVER-EST, the checklist associated with the research object type (if specified) is recommended to the user, but he is able to choose any other if needed (see Figure 19).

Additionally, the platform collects daily statistics about the checklist runs and provides a tool called “Research Object Monitor” (accessible from the quality tab as depicted in Figure 19) to visualize the evolution through the research object lifetime of the quality assessments. The reader is referred to D4.3 for a more indepth description of these tools.

H2020 – EINFRA – 2015 – 1 Page 39 of 43

Figure 19. Research object quality tab in ROHUB

The research object default visualization, in the overview tab (see Figure 20), includes a quality indicator, in the form of a completeness bar, that displays the compliance of the research object against the recommended checklist (or basic if the research object type is not specified).

H2020 – EINFRA – 2015 – 1 Page 40 of 43

Figure 20. Research object overview tab (partial view) in ROHUB

H2020 – EINFRA – 2015 – 1 Page 41 of 43

5 Components Supporting the Social Aspects of Research Objects Based on the requirements of the Virtual Research Communities, the research object lifecycle described in D4.3 has been extended with an additional state for a forked research object. The forked research object represents a new line of work that can be started at any point in time (e.g., to explore different hypotheses, test different configurations, to build on and extend an existing work), either while the research object is alive by forking the live research object at some point in time, or once the research object has reached a milestone/stable state by forking any of the research object snapshots or archive. Note that a forked research object is also alive, i.e., it will be subject to different updates/modifications, and therefore it is defined as a subclass of the live research object.

Moreover it is important to note that whenever a snapshot, archive or fork is generated, the evolution component should maintain the appropriate evolution information. In the next section, we describe how ROHUB handles it.

5.1 ROHUB support of the research object lifecycle management, including assignation of DOIs

ROHUB supports and implements a rich API providing different operations for the management of the research object lifecycle (please refer to D5.4 for more technical details). Internally, ROHUB generates all the necessary annotations in the research objects to keep track of its evolution, and providing vital information about the attribution, credit and citability. In particular, the following operations are carried out automatically by ROHUB:

● When a snapshot is generated, the following annotations are generated: ○ Time when the snapshot was generated. ○ Agent (person) that generated the snapshot. ○ A reference to the RO from which this is a snapshot. ○ If there is at least one previous snapshot.

■ A link to the previous snapshot (previous revision). ■ A comparison with respect to the previous snapshot is carried out to identify and

represent explicitly all the changes (i.e., additions/removals/modifications) from this snapshot with respect to the previous one.

● When an archive is generated, the following annotations are generated: ○ Time when the archive was generated. ○ Agent (person) that generated the archive. ○ A reference to the RO from which this is an archive. ○ If there is at least one previous snapshot.

■ A link to the last snapshot (previous revision). ■ A comparison with respect to the last snapshot is carried out to identify and represent

explicitly all the changes (i.e., additions/removals/modifications) from this archive with respect to the last snapshot.

● When a fork is generated, the following annotations are generated: ○ Time when the fork was generated. ○ Agent (person) that generated the fork. ○ A reference to the RO from which this is a fork (citation). ○ A reference in the source RO to the new fork (cited by).

H2020 – EINFRA – 2015 – 1 Page 42 of 43

Additionally, if a DOI is requested for the snapshot/archive, ROHUB verifies that the research object fulfils the minimum requirements to get a DOI (i.e., have a title, description, etc.). If the evaluation passes, ROHUB contacts the DOI provider to create a new DOI and creates an annotation in the research object with this metadata information; otherwise, the operation is cancelled.

ROHUB portal exposes the evolution functionalities to the end-users, although currently with some limitations. In particular, in the current RO overview tab, there is a button called “evolution” that allows users to create snapshots, archives or forks (see Figure 21). Note that in the overview tab, the user can also see details about the research object evolution, e.g., number of snapshots/archives/forks associated to the current research object.

Figure 21. Research object overview tab (partial view) in ROHUB

Currently, however, when the user generates a snapshot/archive/fork, the name is automatically assigned, and there is no possibility to customize the new research object (e.g., to do some curation tasks like adding/removing annotations, or removing resources the user does not want to be part of the new research object). In the next

H2020 – EINFRA – 2015 – 1 Page 43 of 43

release of ROHUB portal, it is envisioned to provide the users with a wizard that will allow them to do these kind of customizations before generating the snapshot/archive/fork.

Additionally, if the user goes to the lifecycle tab, he will be able to see all the snapshots/archives/forks. At the moment of writing, this tab is still under development in the new ROHUB portal toward a view similar to the one depicted in Figure 22.

Figure 22. Research object lifecycle tab (under development) in ROHUB

Date post:	28-Jul-2020
Category:	Documents
Upload:	others
View:	4 times
Download:	0 times