Integrating CERIF entities in a multidisciplinary e-infrastructure for environmental research data
Enrico Boldrinia, Daniela Luzib, Stefano Nativia, Fabrizio Pecorarob
aInstitute of Atmospheric Pollution Research, National Research Council (CNR-IIA), Sesto Fiorentino, ItalybInstitute for Research on Population and Social Policies, National Research Council (CNR-IRPPS), Rome, Italy *
CRIS2014 - Rome, 13-15 May 2014
CRIS2014 - Rome, 13-15 May 2014
Aims Background Two-way Crosswalk
From ISO 19115 INSPIRE profile to CERIF From CERIF to ISO 19115
Proposal of CERIF extension Proposal of a CERIF profile in ISO 19115 Implementation in a brokering framework Discussion
Index
CRIS2014 - Rome, 13-15 May 2014
Aim
From ISO to CERIF:
Providing a CERIF guideline for the description of datasets according to the INSPIRE profile ISO 19115
From CERIF to ISO:Proposing an ISO profile for contextual research information on the basis of CERIF concepts
Extension of the Brokering approach used in environmental
e-infrastructures with contextual research information based on CERIF
Two way crosswalk:
Proposal of different solutions to integrate research context information with environmental datasets
CRIS2014 - Rome, 13-15 May 2014
ISO 19115Geographical Information metadata
ISPIRE Metadata Implementing Rules • Eu Directive to implement ISO
19115 to create a European Union spatial data infrastructure
• Core set of mandatory and optional metadata and related constraints
INSPIRE profileISO 19115
• Part of geographical information suite of standards (19100 series)
• Description of geographic information and services: identification, extent, quality, spatial and temporal schema, spatial reference and distribution of digital geographic data
• more than 400 metadata elements• Provision of rules for valid metadata
extensions
CRIS2014 - Rome, 13-15 May 2014
CERIF
Comprehensive conceptual model on research information and related process suitable for different purposes: management, scientific exchange, evaluation …
E-R based, flexible model based on: Base entities Semantic layer Multiple relationships
Constantly maintained by the euroCRIS community
CERIF version 1.6
CRIS2014 - Rome, 13-15 May 2014
Challenges
Citation
CV
Prize
Qualification
ExpertiseAndSkills
Equipment
Facility
Funding
Service
ElectronicAddresse
PostalAddress
Country
CurrencyLanguage
Event
Metrics IndicatorMeasurement
Different domainsscopes
structuressemantics
CRIS2014 - Rome, 13-15 May 2014
Mapping from INSPIRE ISO 19115 profile to CERIF
• Straightforward INSPIRE elements have semantically correspondent elements in the CERIF data model
• Inferential mapping both INSPIRE and CERIF can refer to a data dictionary/vocabulary that contains semantically shared terms;
• Convention the CERIF metadata elements can be accommodated to express some mandatory INSPIRE elements by convention of the parties exposing their metadata
CRIS2014 - Rome, 13-15 May 2014
• Semantically correspondent notation with CERIF entity cfResProd and some related elements
• Automatic discovery and interpretation of datasets exposed in RISs using CERIF model
INSPIRE elements INSPIRE Section ISO 19115 Path ISO
Card. CERIF Path CERIF Card.
Dataset title B1.1 MD_Metadata > MD_DataIdentification.citation > CI_Citation.title [1..1] cfResProd > cfResProdName [1..*]
Geographic Bounding Box B4.1 MD_Metadata > MD_DataIdentification.extent > EX_Extent >
EX_GeographicExtent > EX_GeographicBoundingBox [1..*]cfResProd > cfResProd_GeoBBox > cfGeoBBox
[0..*]
Abstract describing the dataset B1.2 MD_Metadata > MD_DataIdentification.abstract [1..1] cfResProd >
cfResProdDescr [1..*]
Dataset keyword B3 MD_Metadata > MD_DataIdentification.descriptiveKeywords > MD_Keywords [1..*] cfResProd >
cfResProdKeyw [1..*]
Unique resource identifier B1.5 MD_Metadata > MD_DataIdentification.citation >
CI_Citation.identifier [1..*] cfResProd > cfResProdID [1..1]
Resource type B1.3 MD_Metadata.hierarchyLevel [1..*] [fixed by the scope to dataset]
Metadata character set - MD_Metadata.characterSet [1..1] [fixed to UTF-8]
Straightforward mapping
CRIS2014 - Rome, 13-15 May 2014
Information can be inferred using: • CERIF semantic layer (cfClassId …) and link entities • ISO CodeList dictionary
Important to express roles and topics univocally
INSPIRE mandatory elements
INSPIRE
SectionISO 19115 Path ISO
Card. CERIF Path CERIF Card. CERIF Role specification
Dataset responsible party
B9MD_Metadata > MD_DataIdentification.pointOfContact > CI_ResponsibleParty
[1..*]
cfResProd > cfOrgUnit_ResProd > cfOrgUnit > cfOrgUnitName AND cfOrgUnit_EAddr [AND cfResProd > cfPers_ResProd > cfPers > cfPersName]
[1..*]cfClassId ∈ CI_RoleCode (e.g. “custodian”)cfClassSchemeId=”CI_RoleCode”
Metadata point of contact
B10.1 MD_Metadata.contact > CI_ResponsibleParty [1..*]
cfResProd > cfOrgUnit_ResProd > cfOrgUnit > cfOrgUnitName AND cfOrgUnit_EAddr [AND cfResProd > cfPers_ResProd > cfPers > cfPersName]
[1..*]cfClassId CI_RoleCode (e.g. ∈“pointOfContact”)cfClassSchemeId=”CI_RoleCode”
Dataset topic category B2.1
MD_Metadata > MD_DataIdentification.topicCategory
[1..*] cfResProd_Class [1..*]cfClassId ∈ MD_TopicCategoryCode (e.g. biota)cfClassSchemeId=”MD_TopicCategoryCode”
Inferential mapping
CRIS2014 - Rome, 13-15 May 2014
Mapping of information on:• dataset quality and lineage, • temporal reference, • language
INSPIRE mandatory elements
INSPIRE
SectionISO 19115 Path ISO
Card. CERIF Path CERIF Card.
CERIF Role specification
Conformity B7 MD_Metadata > DQ_DataQuality.report [1..*] cfResProd > cfResProd_Meas > cfMeas >
cfMeasName AND cfValJudgeText [1..*] cfMeasName=‘conformity’
Lineage B6.1MD_Metadata > DQ_DataQuality.lineage > LI_Lineage
[1..1] union(cfResProd > cfResProd_Meas > cfMeas > cfMeasDescr) [1..*] cfMeasName=
‘lineage’
Dataset reference date
B5MD_Metadata > MD_DataIdentification.citation > CI_Citation.date
[1..*] cfResProd > cfOrgUnit_ResProd > cfOrgUnit > cfStartDate/cfEndDate
cfClassId=‘author institution’
Metadata date stamp B10.2 MD_Metadata.dateStamp [1..1] cfResProd > cfOrgUnit_ResProd > cfOrgUnit
> cfStartDate/cfEndDate [1..*] cfClassId=‘publisher institution’
Metadata language B10.3 MD_Metadata.language [1..1] cfResProd > cfResProdName > cfLangCode [1..*]
Temporal extent B5.1
MD_Metadata > MD_DataIdentification.extent > EX_Extent > EX_TemporalExtent
[1..*] From minimum to maximum (cfResProd > cfResProd_Meas > cfMeas > cfDateTime) [1..*]
Convention
CRIS2014 - Rome, 13-15 May 2014
New entities related to research products expressing: • condition of access and use, • limitation on public access• dataset language• dataset character codes• + optional ISO information related to the metadata used
A proposal of CERIF extension
CRIS2014 - Rome, 13-15 May 2014
Mapping from CERIF to ISO 19115 profileProposal of extensions according to ISO methodology based on CERIF :• project entity • publications linked to dataset+ expansion of ISO concepts providing more information on Organisations and Persons
CRIS2014 - Rome, 13-15 May 2014
GI-cat discovery broker
• GI-cat broker technology powers different projects and initiatives:
• Italian Antactic Data Center (IADC)
• Italian Special project NextData
• CNR GIIDA• ISPRA catalog of catalogs• AfroMaison• Global Earth Observation
System of Systems (GEOSS)• …
GI-cat enables scientific data search across different, heterogeneous data sources. Results are profiled according to the desired model.
CRIS2014 - Rome, 13-15 May 2014
Implementation results - GI-cat extensions for CERIF
brokers CERIF datasets published according to the
CERIF XML Schema
exposes the resources brokered returning documents
which are conform to the CERIF XML Schema.
CERIF Docs
CRIS2014 - Rome, 13-15 May 2014
CERIF Documents stored in a XML repository are
brokered by GI-cat and republished according to ISO
19115 through the CSW/ISO discovery
interface, required by INSPIRE.
Test case #1 Publishing CERIF products for INSPIRE
Aim: CERIF result products are made available according to INSPIRE
ISO profiler
CSW/ISO
CERIF Docs
CRIS2014 - Rome, 13-15 May 2014
The CERIF profiler enables discovery
through an OpenSearch interface.
INSPIRE datasets stored in a CSW ISO
catalog can be discovered and
converted to CERIF XML documents.
Test case #2 Porting INSPIRE information to CERIF
Aim: INSPIRE datasets are discovered and returned according to CERIF XML Schema
CSW/ISO accessor
CSW/ISO
INSPIRE
Catalog
CRIS2014 - Rome, 13-15 May 2014
Testing results
CRIS2014 - Rome, 13-15 May 2014
Summarising some results … 1)
Data elements mapped:
• 16/20 INSPIRE mandatory elements• 7 --> straightforward • 3 --> inferential • 6 --> by convention
• 6/8 optional elements
Discovery of primary data elements based on CERIF Result Product
CERIF semantic layer facilitates a flexible application of the model in heterogeneous environments
BUTneeds specific constraints and rules
to establish consistent semantic integration
CRIS2014 - Rome, 13-15 May 2014
Summarising some results … 2)
• Proposal of introducing a CERIF profile to extend ISO concepts with contextual research information:
• Projects• Datasets associated to publications
Implementation and successful test of the GI-cat allows without additional implementation efforts:• Integration of ISPIRE datasets in RISs• Integration of RISs with environmental dataset systems
Future work: service discovery, extending mapping to ISO 19119
CRIS2014 - Rome, 13-15 May 2014
Discussion
Proposal of different solutions to be submitted to the euroCRIS community
Some further suggestions:
• Introduction of a specific entity to univocally identify datasets as research products
• Establish set of rules/procedures to create CERIF valid metadata extensions
CRIS2014 - Rome, 13-15 May 2014
Thank you!