+ All Categories
Home > Documents > Towards semantically interoperable metadata repositories: The Asset Description Metadata Schema

Towards semantically interoperable metadata repositories: The Asset Description Metadata Schema

Date post: 25-Nov-2016
Category:
Upload: sebastian
View: 229 times
Download: 9 times
Share this document with a friend
9
Towards semantically interoperable metadata repositories: The Asset Description Metadata Schema Gofran Shukair a , Nikolaos Loutas a,d, *, Vassilios Peristeras b , Sebastian Sklarß c a Digital Enterprise Research Institute, NUI Galway, Ireland b EC, DG for Informatics, Interoperability Solutions for European Public Administrations, Belgium c ]init[AG fu ¨r Digitale Kommunikation, Berlin, Germany d Information Systems Lab, University of Macedonia, Thessaloniki, Greece 1. Introduction Nowadays in the European Union, there is an increasing demand for cross-border and cross-sector delivery of electronic public services, which emphasizes the need for semantic interoperability between public administrations Europe-wide. Semantic interoperability between public administrations at the pan-European level requires sharing common metadata models, which are in turn the fundamental building blocks for information systems interoperability and integration. By definition metadata refers to ‘‘data about data’’ that identifies, describes or facilitates the retrieval, usage and management of digital resources. Metadata exists at multiple levels within an information system describing different aspects of the resources inside such as its structural metadata, and administrative metadata, i.e., creation date, access mechanism, etc. [1]. Metadata schemas are sets of elements designed for a specific purpose, such as for describing a particular type of information resource. The definition or meaning of the elements themselves adds a new type of metadata known as the semantics of the schema, which ensure that the precise meaning of information is understood and preserved throughout exchanges between parties, e.g., different public services collaborating in the provision of a cross-border e-Government service. Henceforth, we refer to these models using the term semantic interoperability asset. A semantic interoperability asset is as a collection of reference metadata elements that sharing them among governments would contribute to increased interoperabili- ty across organisational and geographic boundaries. Examples of semantic interoperability assets include codelists, taxonomies, XML schemas, ontologies, UML diagrams and reference collections of data. Reusing semantic interoperability assets not only saves modelling and development time and effort but also helps building interoperable systems [3]. For instance, reusing SEMIC.EU’s Core Person Specification, 1 which is the result of collaborative work of approximately 100 experts, to model the person concept in different information systems across Europe, will automatically enable their semantic interconnection. In this vein, governments as well as the European Commission are sharing their metadata on the Web to encourage their re- usability and consequently facilitate interoperability. This has led Computers in Industry 64 (2013) 10–18 A R T I C L E I N F O Article history: Received 5 September 2012 Accepted 13 September 2012 Available online 3 November 2012 Keywords: Semantic interoperability Asset Metadata repository e-Government ADMS A B S T R A C T The divergent interpretations of data, the lack of common metadata and the absence of universal reference data hinder governments from seamless data exchange, information systems integration and the delivery of cross-border public services. To overcome this, governments develop e-Government metadata repositories to store reusable data models, schemata, taxonomies and codelists. We use the term semantic interoperability asset to refer to these types of resources. These repositories however differ in their scope, target group, implementation technologies and end-user interfaces. Although, the semantic content they include can often be reused, even bypass the domain it was originally designed for, their physical isolation and the heterogeneity of the assets’ descriptions hamper the reusability of common concepts and cross-repository search. To deal with these challenges, this paper introduces the Asset Description Metadata Schema, an initiative of the ISA programme of the European Commission, which aims to deliver a common metamodel for semantic interoperability assets. ß 2012 Elsevier B.V. All rights reserved. * Corresponding author at: PwC EU Services, Woluwedal 18, 1932 Zaventem, Belgium. Tel.: +32 4 91965851. E-mail addresses: [email protected] (G. Shukair), [email protected] (N. Loutas), [email protected] (V. Peristeras), [email protected] (S. Sklarß). 1 See http://www.semic.eu/semic/view/snav/Conformance/specification.xhtml. Contents lists available at SciVerse ScienceDirect Computers in Industry jo ur n al ho m epag e: ww w.els evier .c om /lo cat e/co mp in d 0166-3615/$ see front matter ß 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.compind.2012.09.003
Transcript
Page 1: Towards semantically interoperable metadata repositories: The Asset Description Metadata Schema

Computers in Industry 64 (2013) 10–18

Towards semantically interoperable metadata repositories: The Asset DescriptionMetadata Schema

Gofran Shukair a, Nikolaos Loutas a,d,*, Vassilios Peristeras b, Sebastian Sklarß c

a Digital Enterprise Research Institute, NUI Galway, Irelandb EC, DG for Informatics, Interoperability Solutions for European Public Administrations, Belgiumc ]init[AG fur Digitale Kommunikation, Berlin, Germanyd Information Systems Lab, University of Macedonia, Thessaloniki, Greece

A R T I C L E I N F O

Article history:

Received 5 September 2012

Accepted 13 September 2012

Available online 3 November 2012

Keywords:

Semantic interoperability

Asset

Metadata repository

e-Government

ADMS

A B S T R A C T

The divergent interpretations of data, the lack of common metadata and the absence of universal

reference data hinder governments from seamless data exchange, information systems integration and

the delivery of cross-border public services. To overcome this, governments develop e-Government

metadata repositories to store reusable data models, schemata, taxonomies and codelists. We use the

term semantic interoperability asset to refer to these types of resources. These repositories however

differ in their scope, target group, implementation technologies and end-user interfaces. Although, the

semantic content they include can often be reused, even bypass the domain it was originally designed for,

their physical isolation and the heterogeneity of the assets’ descriptions hamper the reusability of

common concepts and cross-repository search. To deal with these challenges, this paper introduces the

Asset Description Metadata Schema, an initiative of the ISA programme of the European Commission,

which aims to deliver a common metamodel for semantic interoperability assets.

� 2012 Elsevier B.V. All rights reserved.

Contents lists available at SciVerse ScienceDirect

Computers in Industry

jo ur n al ho m epag e: ww w.els evier . c om / lo cat e/co mp in d

1. Introduction

Nowadays in the European Union, there is an increasingdemand for cross-border and cross-sector delivery of electronicpublic services, which emphasizes the need for semanticinteroperability between public administrations Europe-wide.Semantic interoperability between public administrations at thepan-European level requires sharing common metadata models,which are in turn the fundamental building blocks for informationsystems interoperability and integration. By definition metadatarefers to ‘‘data about data’’ that identifies, describes or facilitatesthe retrieval, usage and management of digital resources.Metadata exists at multiple levels within an information systemdescribing different aspects of the resources inside such as itsstructural metadata, and administrative metadata, i.e., creationdate, access mechanism, etc. [1].

Metadata schemas are sets of elements designed for a specificpurpose, such as for describing a particular type of informationresource. The definition or meaning of the elements themselves

* Corresponding author at: PwC EU Services, Woluwedal 18, 1932 Zaventem,

Belgium. Tel.: +32 4 91965851.

E-mail addresses: [email protected] (G. Shukair), [email protected]

(N. Loutas), [email protected] (V. Peristeras),

[email protected] (S. Sklarß).

0166-3615/$ – see front matter � 2012 Elsevier B.V. All rights reserved.

http://dx.doi.org/10.1016/j.compind.2012.09.003

adds a new type of metadata known as the semantics of the schema,which ensure that the precise meaning of information isunderstood and preserved throughout exchanges between parties,e.g., different public services collaborating in the provision of across-border e-Government service.

Henceforth, we refer to these models using the term semantic

interoperability asset. A semantic interoperability asset is as acollection of reference metadata elements that sharing themamong governments would contribute to increased interoperabili-ty across organisational and geographic boundaries. Examples ofsemantic interoperability assets include codelists, taxonomies,XML schemas, ontologies, UML diagrams and reference collectionsof data.

Reusing semantic interoperability assets not only savesmodelling and development time and effort but also helps buildinginteroperable systems [3]. For instance, reusing SEMIC.EU’s CorePerson Specification,1 which is the result of collaborative work ofapproximately 100 experts, to model the person concept indifferent information systems across Europe, will automaticallyenable their semantic interconnection.

In this vein, governments as well as the European Commissionare sharing their metadata on the Web to encourage their re-usability and consequently facilitate interoperability. This has led

1 See http://www.semic.eu/semic/view/snav/Conformance/specification.xhtml.

Page 2: Towards semantically interoperable metadata repositories: The Asset Description Metadata Schema

Fig. 1. Research methodology.

G. Shukair et al. / Computers in Industry 64 (2013) 10–18 11

to a new kind of repositories focusing primarily on semanticinteroperability assets, such as Digitaliser.dk in Denmark, the ESDtoolkit standards lists in the UK and the European Union repositorySEMIC.EU.

However, the full potential of metadata repositories is currentlynot fully exploited as these repositories did not for see exchangingtheir data with other systems. Many legal, organizational, technicaland semantic barriers exist. Examples of these barriers include thelack licensing and quality assessment information of their content,the use of different formats and terminology as well as theirphysical isolation [2].

ISA,2 the European Commission programme on InteroperabilitySolutions for European Public Administrations, is leading aworking group effort to define a standard model to describesemantic interoperability assets. Representatives from more than10 EU Member States, standardization bodies and top-leagueresearch institutes participate in the working group. The workpresented in this paper is part of this initiative. It is in fact one ofthe inputs given to the working group in order to initiate thestandardization process.

Summarising, the main research challenge to be pursued in thispaper is facilitating the sharing and re-use of semantic interoper-ability assets by overcoming heterogeneous and not aligned AssetMetadata and physical isolation of the hosting repositories.

The remainder of the paper is organized as follows: Section 2describes our research methodology. Section 3 presents anextensive study of related metadata description models. Theanalysis of existing repositories is described in Section 4. In Section5, we introduce ADMS, the Asset Description Metadata Schema asan RDF model to represent semantic interoperability assets.Feasibility study and evaluation of the model are presented inSection 6. Finally, Section 7 concludes the paper and discusses ourfuture research direction.

2. Research methodology

The work in this paper has been based on the Design ScienceResearch Methodology (DSRM) [5]. DSRM incorporates principles,practices, and procedures required to carry out research ininformation systems and meets three objectives: it is consistentwith prior literature; it provides a nominal process model for doingDS research; and it provides a mental model for presenting andevaluating DS research in information systems. Selecting DSRMensures that there exist clear links and a smooth transitionbetween the design and development of our model and itsapplication and evaluation in the case study.

DSRM includes six activities: problem identification andmotivation, definition of the objectives for a solution, designand development, demonstration, evaluation, and communication.

2 See http://ec.europa.eu/isa/index_en.htm.

We define these activities in the context of our work as follows(see also Fig. 1):

i. Problem identification and motivation. Our research problem isdriven by the following challenge: semantic metadata models –the actual means of semantic interoperability between e-Government systems – are locked in heterogeneous, distribut-ed and isolated metadata repositories that differ in metadatamodels and technologies.

ii. Objectives of the solution. The objectives of this work aresummarized in the following:a. To enable the integration and semantic interconnection

between distributed e-Government metadata repositories;and

b. To facilitate the discovery, access and reuse of semanticinteroperability assets.

The objectives will be realized through:a. A mutual agreement on the meaning of concepts describing

an asset;b. A new metadata exchange model to identify types of the

reusable resources, facilitate their discovery and ensureminimum consistency of metadata across e-Governmentrepositories; and

c. The definition of a conceptual architecture for the federationof these repositories.

iii. Design and development. We first analyse a selection of e-Government repositories and models to determine the char-acteristics of the proposed exchange model (Section 4), andthen specify an RDF vocabulary to enable their semanticinterconnection. This vocabulary is termed Asset DescriptionMetadata Schema (ADMS) (Section 5). The adoption of RDF andthe realization of the Linked Data principles [4] allowspublishing metadata in a uniform and machine-readablemanner and creates semantic links between them on top ofthe existing Web infrastructure.

iv. Demonstration. We introduce a federation architecture andimplement the federation portal prototype to demonstrate theusage of ADMS as a common metadata model.

v. Evaluation. We run an evaluation exercise and collect thefeedback of 20 developers who need e-Government metadatain their daily tasks. Both Demonstration and Evaluation stepsare explained in Section 6.

3. Related work

Many efforts are directed towards enabling interoperableinformation systems through consistency and uniformity in theway that information is described, stored and retrieved, especiallyin complex organizations such as governments. However, only fewexploit metadata to describe semantic interoperability assets in a

Page 3: Towards semantically interoperable metadata repositories: The Asset Description Metadata Schema

Fig. 2. Metadata description models/schemas.

4 See http://logd.tw.rpi.edu/page/international_dataset_catalog_search.5 See http://data.gov.uk/data.6 See http://thedatahub.org/.

G. Shukair et al. / Computers in Industry 64 (2013) 10–1812

way that they can be easily identified and located. These assets arethe actual artifacts needed within and across public sectorinformation systems to support interoperability.

In this section we study the metadata description schemata inthe literature and we identify the following categories (see Fig. 2):ISO-based models, DC-based models and RDF-based models.

ISO-based models employ the ISO/IEC 11179 13 standard for thespecification and standardization of data elements. According toISO/IEC 11179, a Metadata Registry (MDR) is a database ofmetadata that supports the functionality of registration. The corefunction of metadata schema registries is to collect, store andprovide reference descriptions of metadata schemata. DESIRE [6]and CORES [7] are two notable examples of ISO-based models.

DESIRE – stands for the Development of a European Service forInformation on Research and Education – provides an MDRimplementation that supports the registration of elements frommultiple name-spaces.

CORES builds on the work of the DESIRE and SCHEMAS3 projectsto provide an ISO-based model specifically targeting Europeanprojects. CORES indexes standard metadata element sets andapplication profiles that use these standards.

Both CORES and DESIRE provide a sufficient description of theelement set or schemata to enable mapping and cross-walkingbetween metadata elements. However, their models are notextensible and do not cover details of different types of metadataresource such as taxonomies and ontologies.

DC-based models reuse and extend the Dublin Core (DC) elementset [8,9]. DC is one of the most influential, domain-independentinitiatives in the area of digital resource metadata description. Anumber of different communities, e.g., digital libraries and e-Government, used DC as a basis to define metadata standardssuitable for their specific needs. Many countries worldwide havedefined metadata frameworks based on DC model as part of theirnational e-Government strategies. Prominent examples include e-GMS (UK) [10], AGLS (Australia) [11], NZGLS (New Zealand) [12]and CLF (Canada) [13].

UK e-GMS standard lays down the elements, refinements andencoding schemes to be used by government officers whencreating metadata for their information resources or whendesigning search interfaces for information systems.

The Australian Government Locator Service (AGLS) extends DCwith elements describing the function and the availability ofgovernment information records.

The Canadian CLF adds the audience and keywords elements toDC. DC-based government models are designed mainly to describedocuments and public sites.

3 See http://www.schemas-forum.org/.

Nevertheless, all these efforts include only generic propertiesthat are not sufficient for completely fulfilling the needs of thediverse audience of government semantic interoperability assets,e.g., developers interested in ontologies or codelists or projectmanagers interested in UML diagrams or reference datasets.

RDF-based models use the Resource Description Framework(RDF) as a data model. Semantic Web and Linked Data technologieshave been applied to many e-Government catalogues andrepositories to achieve machine-readable representations of theircontent metadata using RDF. The adoption of such technologieshas several benefits like decentralised publishing and Webaccessibility.

DCAT [14] is an RDF-based vocabulary to describe governmentcatalogues and datasets. DCAT deals with general governmentdatasets which can be a CSV file or a geographic shape file. Itdefines a dataset as a collection of data, published or curated by asingle source and available for access or download in one or moreformats. Open Government Dataset Catalog (IOGDC),4 data.gov.uk5

and the Data Hub6 use DCAT vocabulary as their data models.While DCAT represents generic datasets, our work here focusesmore specifically on reusable datasets as well as reusable metadatamodels and standards.

Another RDF-based model, which is used by the ESD7 Toolkit, isbased on SKOS Vocabulary [15]. ESD uses RDF and Linked Dataprinciples to publish the standards lists and their associatedmetadata. In the ESD toolkit, lists are modelled as SKOS ConceptSchemes while individual items are modelled as SKOS concepts.

Other RDF-based models focus on describing specific types ofdatasets or semantic metadata resources. For example, VOiD [16]describes RDF datasets and their relationships, and the RDF DataCube vocabulary [17] represents statistical datasets. The Vocabu-lary of a Friend (VOAF) [18] is an initiative to describe therelationships between Linked Open Data8 vocabularies. VOAFprovides a few interesting metadata elements, such as vocabularyspace, and a number of classes and properties.

Another example of RDF-based models describing a particulartype of datasets is the Ontology Metadata Vocabulary (OMV) [19].OMV is an effort towards a standardized vocabulary to annotateontologies with metadata that helps to improve accessibility andreuse of ontologies.

Finally, the Knowledge Organization Systems Ontology (KOSO)aims to provide shared definitions for the different types ofKnowledge Organization System (KOS), and enable detaileddescription of individual KOS to provide structured access to theavailable resources, and therefore allows retrieving them for reuseand modifications [20].

Most of the models discussed so far are designed to improve thediscoverability and facilitate accessibility of digital resources, i.e.,elements sets, documents or datasets, by exploiting metadata.However, only few of them explicitly define properties for indexingand classifying these resources in a way that allows the diversity ofgovernment data audience to identify subsets of these onlineresources in a plethora of digital collections. The Asset DescriptionMetadata Schema, detailed in Section 5, consolidates theaforementioned efforts by introducing a higher metadata levelconcerning the types of semantic interoperability assets. ADMSuses terms like domain, geographical coverage, asset type andinteroperability level to enrich asset descriptions.

Fig. 3 illustrates the relation between ADMS and the RDF-basedmetadata description models explained above. ADMS describe any

7 See http://standards.esd.org.uk/.8 See http://richard.cyganiak.de/2007/10/lod/.

Page 4: Towards semantically interoperable metadata repositories: The Asset Description Metadata Schema

Fig. 3. Sets diagram: ADMS relation to RDF-based models in the e-Government data

space.

Table 1Sample data.

Repository #Assets Method of data collection

SEMIC.EU 516 Screen scrapping

Digitaliser 2000 REST API

ESD toolkit service lists 34 SPARQL endpoint

XRepository 5 Data dump

G. Shukair et al. / Computers in Industry 64 (2013) 10–18 13

dataset, KOS, Ontology or vocabulary reusable by public admin-istrations to boost interoperability. The circles express thecontainment relationship between the different sets of resourcesand semantic assets set is a combination of the four sets, i.e.,datasets, KOS, ontologies and vocabularies.

4. Analysis of e-Government metadata repositories

This section describes the design and development step of theDSRM methodology. Our main objective here is to define the AssetDescription Metadata Schema (ADMS) as a common descriptionschema for semantic interoperability assets, thus enabling thefederation of national e-Government asset repositories.

These repositories contain assets that can be reused to assistinteroperability, but the range of these assets is broad and rangefrom a list of reference values, e.g., country codes, to a conceptualmodel or an ontology.

We follow a bottom-up conceptualization process. We studyand analyse the metamodels of the following four e-Governmentmetadata repositories and use them to derive the initial version ofADMS:

� Digitaliser.dk9 is an e-Government metadata repository fromDenmark. It hosts e-Government-related documents, software,technical specifications and standards, and XML schemata.� XRepository10 is an e-Government metadata repository from

Germany. It stores a set of XML-based technical standards forelectronic data exchange.� ESD Standard Service Lists11 from the UK serves as an MDR where

‘‘controlled lists, cross-references and data interchange stan-dards for use by the public sector’’ are stored and maintained.� SEMIC.EU12 is an initiative led by the European Commission

through the ISA program to foster the reuse of syntactic andsemantic assets across Europe [20].

All four repositories contain reusable assets, but differ in scope,target group, technologies used, context, e.g., different countries,and administrative levels, e.g., national and pan-European.Although their metamodels have been created for the samepurpose, i.e., for describing national metadata resources for reuse,they vary in the terminology used and the level of granularity, i.e.,

9 See http://www.digitaliser.dk.10 See https://www.xrepository.deutschland-online.de/xrepository/.11 See http://standards.esd.org.uk/identified.12 See http://www.semic.eu/.

the extent to which the Assets are divided and described. Forexample, while SEMIC.EU describes a model as an atomic unit, theESD Toolkit additionally describes the individual elements withinthe model. In order to align the four different metamodels, we firstcollected and studied sample data from each repository and thenmapped their metamodels against an initial common model. Thefollowing sections elaborate on these steps.

4.1. Data collection

We collected sample data from each repository using differenttechnologies. Each repository has its own access mechanism thatwe had to cope with in order to acquire its metadata. Morespecifically, Digitaliser.dk provides a REST API to access its data andmetadata, XRepository provided a sample CSV dataset. SEMIC.EUwas screen-scraped and ESD toolkit assets metadata werecollected via its SPARQL endpoint.

Table 1 shows the number of assets collected from eachrepository. We stored this data in CSV files. Google Refine13 wasused to clean the data removing repository-specific information.The multi-facet functionality of Google Refine proved to be veryhelpful in examining the data.

4.2. Metamodel mapping

We started our analysis exercise with an initial commonmetamodel to use as a point of reference when aligning themetamodels of the four repositories. SEMIC.EU’s metamodelplayed this role as it was originally designed to capture usefulinformation that would facilitate search for assets available indifferent EU countries. We extended this initial draft using DCATby reusing the following classes: dcat:Dataset, dcat:Distribution anddcat:Catalog and the properties dcat:accessURL, dcat:dataQuality,dcat:size and dcat:keywords properties. The initial metamodelincludes two main concepts: Asset and Release. The Asset is acollection of files that represent artifacts published by a govern-ment body. Each Asset has one or more Releases, which representthe published versions of the Asset (see Fig. 3).

Most of the partaking repositories did not have their metadatamodel openly available in order to analyse it accurately with theexception for ESD. We had to use the data collected and eachrepository website, to aggregate approximate metadata models ofthese repositories in order map them to our initial model and toDCAT and VOAF vocabularies for more accuracy. Mappings werecreated at design time. The mapping/alignment process is iterativeand includes the following steps:

1. Identify the core concepts (classes).2. Identify the attributes of the core concepts.3. Check the consistency and the availability of attributes values in

the sample data.4. Update the draft of the common metamodel.

Identify the core concepts: Table 2 illustrates the core concepts ofeach repository’s metamodel – including DCAT and VOAF – and

13 See http://code.google.com/p/google-refine/.

Page 5: Towards semantically interoperable metadata repositories: The Asset Description Metadata Schema

Table 2Core concepts.

Repository Asset Release

SEMIC.EU Asset Release

Digitaliser.dk Resource Artifact

XRepository Content (Inhalte) Content (Inhalte)

ESD Service list Different file formats

DCAT Dataset Distribution

VOAF Vocabulary RDF file

G. Shukair et al. / Computers in Industry 64 (2013) 10–1814

their correspondence to the common metamodel. We used thistable as the basis for mapping the metadata fields in the next step.

Identify the attributes of the core concepts: Table 3 (AssetMetadata) and Table 4 (Release Metadata) show the mappings ofthe attributes of the core concepts of the four repositoriesmetamodels, and of DCAT and VOAF to these of the initialmetamodel.

Studying Table 3, we observe that title, description, owner,release and link to website are the common fields between the sixmetamodels. Moreover, the ID field is common between four of

Table 3Asset Metadata fields.

Asset Metadata SEMIC.EU Digitaliser XReposito

Created Initial publication Created –

Description Description Body text Descriptio

Domain/scope Domain Resource category Category

File format File format – File forma

License License – License

Link to website URL URL URL

Owner Owner Owner Client

Publication Status State Status Status of

Publisher Provided by Contact info Publisher

Quality level – – –

Related asset Related asset – Related a

Related project Related project – –

Release Release Related artefact Download

Repository origin – – –

Represented country Country – –

Keywords Keywords Tags Keywords

Title Title Title Name

Unique ID Unique ID Unique ID ID

UpdateDate Last change – UpdateDa

URI URI URI

User GROUPS – – –

Table 4Release Metadata fields.

Release Metadata SEMIC.EU Digitaliser

URI – URI

Publication status State

Access URL Access URL Access URL

Documentation language Language –

Type Release CONTENTS Content

release date Publication date –

Release name Release number Title

Release notes Release notes –

File format <zip> –

Related release Related versions Outgoing relation referenc

Release size File size –

Table 5Consistency of metadata fields (+, = and � represent high, medium and low consistenc

License Country Publication Status

Digtaliser.dk x x +

SEMIC.EU = + +

ESD x + x

them, i.e., SEMIC.EU, Digitaliser, XRepository and ESD andexpressed as URI in DCAT and VOAF. Table 4 shows that access

URL is common between five of the metamodels and is easilyinferred for VOAF as the access URL for a vocabulary is normally itsURI. The type field is also common in all of them, but differenttaxonomies of types are available each time. Furthermore version

information is not covered by all the metamodels. Commonmetadata fields from both tables are included in the first version ofADMS.

Check the consistency and the availability of attributes values:

Tables 5 and 6 summarize the results of the consistency and theavailability checks of the most important metadata fields’ values insample data from Digitaliser.dk, ESD and SEMIC.EU repositories.We exclude XRepository due to the small size of the datasetprovided.

In this context, consistency measures whether the values of acertain metadata field are coming from a controlled vocabulary orthey are unique per asset/release instance. Where the symbols +, =and – represent high, medium and low consistency respectively.Availability on the other hand is the percentage of non-blank

ry ESD DCAT VOAF

Created –

n Description – Description

Domain Theme Vocabulary space

t – <RDF>

License – –

URL URL URL

Owner – Creator

content – –

– Publisher

– –

sset – reliesOn, usedBy, extends, etc.

– –

file Formats Distribution Download file

Catalog –

– –

Keywords –

Title – Name

ID –

te Modified –

URI URI

Group –

Xrepository ESD DCAT VOAF

– URI URI URI

Status of content – – –

Download URL Access URL –

language – – –

Type Type (sub-classes) (RDF)

– – – –

Version name Version name – –

Release notes – – –

File format Format – <RDF>

e – – – –

– – – –

y).

Keywords LicenceURI File format Domain

� � + +

� = + +

x x + x

Page 6: Towards semantically interoperable metadata repositories: The Asset Description Metadata Schema

Table 6Availability percentage of metadata fields.

Description Domain License Country Related asset File format Related project

Digtaliser.dk 81 99 0 0 99 100 0

SEMIC.EU 100 100 80 100 42 95 100

ESD 100 x x 71 x 100 x

Fig. 4. General ADMS overview (http://joinup.ec.europa.eu/asset/adms/release/06).

Fig. 5. Architecture of the Federated Portal of Repositories on Semantic

Interoperability Assets.

G. Shukair et al. / Computers in Industry 64 (2013) 10–18 15

values in relation to the overall number of values available for acertain metadata field, for example 50% availability means thatexactly half of the values are blank.

Table 5 shows that keyword values are inconsistent in bothSEMIC.EU and Digitaliser.dk as they are usually entered using freetext input. On the other hand status, country and domain values arecoming from controlled list of values but the terminology isdifferent between repositories.

Table 6 shows that description and file format have the highestpercentage of non-blank values. While 80% of assets have theirlicence associated in SEMIC.EU, none has licence assigned inDigitaliser.dk. Fields with high availability percentage will beincluded in the first version of ADMS.

Update the draft of the common metamodel: based on the resultsand the knowledge gained from the mapping process, we enhancedthe draft of the common metamodel. We thus formulate the firststable version of ADMS described in detail in following section.

5. The Asset Description Metadata Schema

ADMS is designed as a common metamodel for e-Governmentsemantic interoperability assets. ADMS enables the identificationof types of the reusable resources, facilitates their discovery andensures minimum consistency of metadata across online e-Government metadata repositories. The definition of ADMScapitalizes heavily on the analysis of Section 4 and reuses,wherever possible, classes from existing vocabularies.

ADMS distinguishes between the conceptual Asset and itsreleases. This separation is based on the following observation: asemantic interoperability asset (adms:Asset) is normally developedin the first place for a specific purpose; it is then delivered intodifferent machine- and/or human-readable formats (adms:Re-

lease). Furthermore, it may go through different quality-assess-ment and review stages to become reusable and this will resultsinto different releases of it. Fig. 4 illustrates the main concepts andproperties of ADMS.

Definition. adms:Asset is any technical, legal or organisationalresource that can reused within and outside public administrationand governments body to facilitate interoperability.

Definition. adms:Release is the physical embodiment of an asset,it is typically a downloadable computer file (but in principle itcould also be a paper document) that implements the intellectualcontent of an Asset.

The definition of adms:Asset concept allows writing moreexpressive RDF statements to describe a particular semanticinteroperability Asset. Technically it is defined as a subclass ofdcat:Dataset and extends it by defining new properties to capturecontext-specific information, and by reusing properties from DC,e.g., dc:publisher and dc:spatial.

Capturing contextual information is essential in facilitatingsearch and querying functionalities. ADMS captures contextualinformation using the following classes and properties:

� adms:Domain describes the main topic of the asset. It is assignedto an Asset using the adms:domain property. Each Asset maycover one or more domains. adms:Domain is a subclass ofskos:Concept.

� adms:relatedProject is a property that links each Asset with one ormore projects in the context of which the asset was developed,used or is somehow related.� adms:Country represents countries involved in the development

and/or countries in the scope of an asset. It is assigned to an assetusing the dc:spatial property.� adms:tags is a property for taging the assets with expressive

keywords.

An Asset goes through many successive stages of qualityassessment, development and feedback in order to mature andbecome a good candidate for reuse. This is expressed in ADMSusing different instances of adms:Release, adms:hasRelease is a

Page 7: Towards semantically interoperable metadata repositories: The Asset Description Metadata Schema

Fig. 6. Snap-shots of Federated Portal of Repositories on Semantic Interoperability Assets accessed on http://vmudi205.deri.i.e./elda/index.html.

14 See http://www.geonames.org/.15 See http://dbpedia.org/About.16 See https://github.com/gofranshukair/Digitaliser-Google-Refine-extension.

G. Shukair et al. / Computers in Industry 64 (2013) 10–1816

property linking each Asset to its latest release. The adms:type

property associates the different types of the release. Thedcat:accessURL is used to represent the direct access to the locationof a specific Release on the Web. Each adms:Release instance isannotated using DC annotation properties, such as dc:description

and dc:issued. The adms:relatedRelease property is used to connectthe latest release with its previous ones. Taking into account theinitial status of this model, there are a couple of elements lack thecoverage by existing codelists, and this needed to be addressedfuture versions of ADMS:

1. adms:ArtefactType: the type of artefact included in a release.Examples might be UML Specification, Ontology, Taxonomy,Codelist, etc.

2. adms:Status: the current status of an asset or release. Examplesmight be under development, published, or withdrawn.

3. adms:Domain: domain(s) which are covered by the asset.Examples might be Justice, Internal Security, Economy, Trade,Social Affairs, Education, etc.

4. adms:QualityLevel: the quality level achieved by the asset.Examples might be draft, registered, final, etc.

6. Use case: facilitating cross-repository search

Using ADMS as a common metadata-interchange format,problems such as repository integration and interconnection canbe addressed. This section introduces a federation of metadatarepositories and evaluates its main functionalities, focusing oncross-repository search. The main idea behind the federation ofmetadata repositories is to make available different sources ofsemantic assets through a single access point. ADMS ensures aminimum degree of semantic interoperability, through a commonmodel, which helps the integration of metadata from differentrepositories. More specifically, descriptions of semantic assets inADMS – not the assets themselves – can be published in thefederation, including a link to where a particular asset can befound. The federation will help publishers to stimulate the reuse oftheir asset and reach a wider audience.

This federation is based on the architecture illustrated in Fig. 5.It contains the following components:

� The Publishing API used by metadata providers to describe theirassets and publicly expose their metadata in RDF formatfollowing the ADMS schema and in accordance with the LinkedData guidelines.� The Federated Repository enables RDF data storage.� The SPARQL Endpoint enables querying the metadata in the

Federated Repository using SPARQL query language.� The Querying API enables developers to reuse the metadata in

many different applications, e.g., Linked Data mash-ups.� The Faceted browsing interface enables users to browse, search

and filter semantic interoperability assets, thus providing accessto the metadata of the semantic assets and allowing to locatethem in the distributed (source) repositories.

We developed the Federated Portal of Repositories on SemanticInteroperability Assets, which realizes the architecture of Fig. 5 aspart of the demonstration step of the DSRM methodology. Briefimplementation details per component follow.

The four repositories described in Section 4 participated in thefederation. We used the sample data and the mappings collected aspart of the repository analysis process. The data was stored in CSVfiles and the RDF extension of Google Refine was used toimplements the mappings and convert the data to RDF. Thereconciliation service of the extension was used to replace knownentities values like countries and languages with URIs from Geo-names14 and Dbpedia15 and exported RDF data in ADMS format[21].

Moreover, we implemented a digiltaliser-specific Google Refineextension16 to enable the repository owner to publish their AssetMetadata in the ADMS format with minimum intervention andeffort from users. Such an extension will encourage repositoryowners who want to adopt ADMS to make their metadataaccessible through REST APIs.

Page 8: Towards semantically interoperable metadata repositories: The Asset Description Metadata Schema

G. Shukair et al. / Computers in Industry 64 (2013) 10–18 17

Once we had all Asset Metadata represented in RDF weaggregated them in a RDF sesame triple store, i.e., the federatedrepository component of the architecture. We utilized the LinkedData API17 to allow user to access this federated repository, searchfor certain assets using the metadata field filters and downloadand/or access their query results in machine-readable formats. Wethus put in place the querying API which allows users to exploit thecontent negotiation service and make use of URI patterns providedby the Linked Data API as way to access the federation metadatafrom third-party applications.

Fig. 6 shows two screenshots of the federation portal whereusers can search for semantic interoperability assets in the fourrepositories (left) and download releases directly through oneinterface (right).

6.1. Evaluation

The last step of the DSRM methodology is the evaluation of theresearch. We conducted a primary evaluation by introducing theFederated Portal of Repositories on Semantic InteroperabilityAssets to 20 developers, asked them to use it and collected theirfeedback. The evaluation included instructions on how to use theportal and a 14-question survey. The survey included 10statements with a 5-points agreement scale from 1-StronglyDisagree to 5-Strongly Agree.18

From a usability point of view, 80% of the evaluators agreed thatthe user interface is easy-to-use and they were able to find assetseasily. Keyword (100%), Country (75%) and Domain (45%) were themost used fields in the search form when looking for Assets.

While 90% of the evaluators agreed that they will use thefederated portal to search for Assets using specific criteria, such ascountry name or domain, only 45% of them agreed to use thefederation API for accessing the metadata.

Furthermore, 85% of them agreed that aligning and reusingsemantic assets at a European level is essential for providing EU-wide cross-border public services. Approximately 90% of theevaluators will use the portal in the future when searching forsemantic interoperability assets.

7. Conclusions and future work

Governments are exposing their semantic interoperabilityassets – the actual drivers for interoperability – in online metadatarepositories to encourage reuse of these assets. These repositoriesuse different terminology, models and interfaces and are thereforenot interoperable among themselves. This paper introduced theAsset Description Metadata Schema (ADMS) an RDF vocabulary tobe used as a common representation and exchange format of e-Government repositories. We also presented an ADMS-drivenarchitecture for a federated metadata repository that allows assetdiscovery, access and retrieval in a uniform way. Preliminaryevaluation of the federation portal shows an eager demand of suchvocabulary and its potential usages.

The adoption of such a metamodel is primarily a social process.Hence, this first attempt aims to raise awareness and engage thecommunity to the specific problem. Providing a first approach orproposal for triggering discussion is always a prerequisite to attractand create real possibilities for future adoption and take up. Aworking group was set up by the ISA Programme, whererepresentatives from more than 10 EU Member States, standardsbodies, research and academia, as well as experts, are discussingand planning the evolution of ADMS.

17 See http://code.google.com/p/linked-data-api/.18 Questionnaire available at: http://tinyurl.com/cjbfs56.

Our future research focuses more on the standardisation ofADMS and the development of a set of services to support ADMSdevelopments across Europe. Towards this direction, we areworking on the specification of a unified API to publish and accessADMS from different sources to the federation following LinkedData publishing guidelines [22]. Adopting the API will eliminatethe manual effort done in the publishing component of thearchitecture and seamlessly integrate the repositories into theWeb of Data. Finally, a large-scale deployment of the federation ofrepositories described in Section 6 is planned.

Acknowledgements

The authors thank the ADMS working group for theircontribution in this work. This work was funded in part by ScienceFoundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2).

References

[1] N. Press, Understanding Metadata, National Information Standards OrganizationPress, 2004.

[2] ISA Deliverable, Report on existing Semantic Asset repositories, Available from:http://joinup.ec.europa.eu/sites/default/files/ISA%20Deliverable_Report%20o-n%20existing%20Semantic%20Asset%20Repositories.pdf.

[3] M. D’Aquin, H. Lewen, Cupboard – a place to expose your ontologies to applica-tions and the community, The Semantic Web: Research and Applications (2009)913–918.

[4] T. Berners-Lee, Linked Data (online) last update 2009-06-18, Available from:http://www.w3.org/DesignIssues/LinkedData.html.

[5] K. Peffers, T. Tuunanen, M.a. Rothenberger, S. Chatterjee, A design science researchmethodology for information systems research, Journal of Management Informa-tion Systems 24 (2007) 45–77.

[6] R. Heery, T. Gardner, M. Day, DESIRE metadata registry framework, Retrieved July,2000.

[7] R. Heery, P. Johnston, C. Fulop, A. Micsik, Metadata schema registries in thepartially Semantic Web: the CORES experience, in: Proceedings of the 2003International Conference on Dublin Core and Metadata Applications: SupportingCommunities of Discourse and Practice—Metadata Research & Applications,Dublin Core Metadata Initiative, 2003, pp. 1–8.

[8] E. Tambouris, N. Manouselis, C. Costopoulou, Metadata for digital collections of e-government resources, The Electronic Library 25 (2007) 176–192.

[9] A. Alasem, An overview of e-Government metadata standards and initiativesbased on Dublin Core, Initiatives 7 (2009) 1–10.

[10] Y. Charalabidis, F. Lampathaki, Metadata sets for e-Government resources: theextended e-Government metadata schema (eGMS+), Electronic Government(2009) 341–352.

[11] N. Archives, AGLS metadata standard. Part 1 – reference description, Business(2010).

[12] NGNZ, NZGLS Metadata Element Set (online), Available from: http://www.e.-govt.nz/standards/nzgls/standard, 2005.

[13] E.G. Park, M. Lamontagne, A. Perez, I. Melikhova, G. Bartlett, Running aheadtoward interoperable e-government: the government of Canada metadataframework, International Journal of Information Management 29 (2009)145–150.

[14] F. Maali, R. Cyganiak, V. Peristeras, Enabling interoperability of government datacatalogues, Electronic Government (2010) 339–350.

[15] A. Miles, B. Matthews, M. Wilson, D. Brickley, SKOS Core: Simple knowledgeorganisation for the web, DCMI 5 (2005) 1–9.

[16] K. Alexander, M. Hausenblas, J. Zhao, R. Cyganiak, Describing linked datasets onthe design and usage of voiD, the ‘‘vocabulary of interlinked datasets’’, in:Conjunction with 18th International World Wide Web Conference, 2009.

[17] R. Cyganiak, D. Reynolds, J. Tennison, The RDF Data Cube vocabulary (online), lastupdate 2010-07-14 (cit. 2010-10-01), Available from: http://publishing-statisti-cal-data.googlecode.com/svn/trunk/specs/src/main/html/cube.html.

[18] B. Vatant, Vocabulary of a Friend (VOAF) (online) last update 2011-03-28,Available from: http://labs.mondeca.com/vocab/voaf/index.html.

[19] J. Hartmann, Y. Sure, P. Haase, R. Palma, OMV – ontology metadata vocabulary,ISWC 3729 (2005).

[20] K. Reichling, M. Luts, R. Fahl-Spiewack, A pan-European repository: SEMIC. EU asthe point of reference for eGovernment ontologies, in: Proceedings of the 1stWorkshop on Ontology Repositories and Editors for the Semantic Web, Crete,Greece, 2010.

[21] F. Maali, R. Cyganiak, V. Peristeras, Re-using cool uris: entity reconciliation againstLOD hubs, in: Proceedings of the 4th Linked Data on the Web (LDOW),Workshop at the World Wide Web Conference (WWW2011), Hyderabad, India,2011.

[22] G. Shukair, N. Loutas, V. Peristeras, Integrating linked metadata repositories intothe web of data, in: Proceedings of the 3rd International Workshop on ConsumingLinked Data (COLD2012), Workshop at the 11th International Semantic WebConference (ISWC), Boston, USA, 2012.

Page 9: Towards semantically interoperable metadata repositories: The Asset Description Metadata Schema

in

Gofran Shukair is research master student in Digital

Enterprise Research Institute (DERI). Gofran is a

member of the e-Government unit working mainly

on integrating metadata repositories into the Web of

Data. Gofran holds a Computer Science Degree from

Damascus University – Department of Software Engi-

neering and Information System.

Nikolaos Loutas is a principal advisor at PwC’s

Technology Consulting practice, involved mainly in

open data and semantic interoperability projects for the

public sector. Nikolaos is finishing his PhD in semantic

service models for business information systems. He

holds a Computer Science Degree and an MSc in

Information Systems from the Athens University of

Economics and Business, Greece. Nikolaos has previ-

ously worked as a researcher at the Digital Enterprise

Research Institute (DERI), NUI Galway and the Centre

for Research Technology Hellas (CERTH), where he

initiated, led and participated in several national and

EU-funded research projects in the fields of linked open

data, Semantic Web, SOAs, e-Government, and collaborative work environments

domains, such as SemanticGov, Rural Inclusion, Ecospace, Granatum, Linked2Media

and Cloud4SOA.He has published more than 50 papers in international journals,

conferences and books and has served as programme committee member and

reviewer in numerous international conferences.

G. Shukair et al. / Computers 18

Vassilios Peristeras is a program manager in the

Interoperability Solutions for European Public Admin-

istration Unit at the European Commission. His

research interests include e-Government, semantic

interoperability, enterprise architecture, and metadata

management.

Sebastian Sklarß is a consultant for the German

company]init[in Brussels. He specialises in Open Data

and semantic interoperability. He became chief stake-

holder manager for the Semantic Interoperability

Centre Europe (SEMIC.EU) in 2007. In this role he

was in charge of coaching and the building and

animating of a network of competences for 3 years.

Since 2009 he is in charge for drafting the semantic data

exchange standard of the national firearms register in

Germany (XWaffe). Currently Sebastian is also involved

in the W3C Multilingual Web project and manages the

Open Data portfolio of]init[. He gives consultancy on

open data to big supranational bodies like the European

Patent Office, World Customs Organization and to national agencies like the PortalU

coordination centre and to several Federal Ministries in Germany. Prior to working

for]init[in 2007, Sebastian studied Computer Science with specialisation on

software usability at the ‘‘Universite Paul Verlaine de Metz’’ and worked for the

Public Research Centre Henri Tudor in Luxembourg.

Industry 64 (2013) 10–18


Recommended