+ All Categories
Home > Documents > LDP-DL: A language to de ne the design of Linked Data...

LDP-DL: A language to de ne the design of Linked Data...

Date post: 15-Jun-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
30
LDP-DL: A language to define the design of Linked Data Platforms (Technical Report) Noorani Bakerally, Antoine Zimmermann, Olivier Boissier Univ Lyon, IMT Mines Saint- ´ Etienne, CNRS, Laboratoire Hubert Curien UMR 5516, F-42023 Saint- ´ Etienne, France {noorani.bakerally,antoine.zimmermann,olivier.boissier}@emse.fr Abstract. Linked Data Platform 1.0 (LDP) is the W3C Recommenda- tion for exposing linked data in a RESTful manner. While several imple- mentations of the LDP standard exist, deploying an LDP is still complex and tighly coupled to the chosen implementation. As a consequence, the same design (in terms of data organization) is difficult to reuse in differ- ent LDP deployments. We propose a language for specifying how existing data should be used to generate LDPs in a way that is independent of and compatible with any LDP implementation and deployable on any of them. We formally describe the syntax and semantics of the language and its implementation. We show that our approach 1) allows the reuse of the same design for multiple deployments, or 2) the same data with different designs, 3) is open to heterogeneous data sources, 4) can cope with hosting constraints and 5) significantly automatizes deployment of LDPs. Keywords: RDF, Linked Data, Linked Data Platform
Transcript
Page 1: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

LDP-DL: A language to define the design ofLinked Data Platforms (Technical Report)

Noorani Bakerally, Antoine Zimmermann, Olivier Boissier

Univ Lyon, IMT Mines Saint-Etienne, CNRS, Laboratoire Hubert Curien UMR 5516,F-42023 Saint-Etienne, France

noorani.bakerally,antoine.zimmermann,[email protected]

Abstract. Linked Data Platform 1.0 (LDP) is the W3C Recommenda-tion for exposing linked data in a RESTful manner. While several imple-mentations of the LDP standard exist, deploying an LDP is still complexand tighly coupled to the chosen implementation. As a consequence, thesame design (in terms of data organization) is difficult to reuse in differ-ent LDP deployments. We propose a language for specifying how existingdata should be used to generate LDPs in a way that is independent ofand compatible with any LDP implementation and deployable on anyof them. We formally describe the syntax and semantics of the languageand its implementation. We show that our approach 1) allows the reuseof the same design for multiple deployments, or 2) the same data withdifferent designs, 3) is open to heterogeneous data sources, 4) can copewith hosting constraints and 5) significantly automatizes deployment ofLDPs.

Keywords: RDF, Linked Data, Linked Data Platform

Page 2: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Table of Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Context and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Overview of the Linked Data Platform Standard . . . . . . . . . . . . . . 42.2 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Our Approach: The LDP Generation Workflow . . . . . . . . . . . . . . . . 5

3 LDP Design Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.1 Overview of the language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Abstract Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Formal Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.1 LDP-DL interpretation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2 Satisfaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Informal description of the satisfaction. . . . . . . . . . . . . . . . . . . . . . . 12Satisfaction of a DataSource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Satisfaction of a ResourceMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Satisfaction of a NonContainerMap . . . . . . . . . . . . . . . . . . . . . . . . . . 15Satisfaction of a ContainerMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Satisfaction of an LDP-DL document . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3 Evaluation of a design document using an interpretation. . . . . . . . 165 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.1 Evaluation of a DataSource . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.2 Evaluation of a ResourceMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.3 Evaluation of a NonContainerMap . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.4 Evaluation of a ContainerMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.5 Evaluation of a Design Document . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6 Implementation and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

7 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Page 3: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Title Suppressed Due to Excessive Length 3

1 Introduction

In the open data context, multiple data sources found at numerous locations onthe Web generate massive heterogeneous sets of data whose exploitation can beuseful for domains such as smart cities. Semantic Web standards have alreadyaddressed some heterogeneity levels (syntactic, semantic, structural). However,there is much heterogeneity in the way RDF data is accessed. The Linked DataPlatform (LDP) 1.0 W3C Recommendation [15] has emerged as a solution to thisproblem by standardizing RESTful access to RDF data. Linked data platformscomplying with the LDP standard, that we refer to as LDPs, can be useful toprovide both a homogeneous view and access to data. However, deploying anLDP is still complex in spite of numerous LDP implementations (cf. Sec. 7).

LDPs are data-driven systems and their deployment involves both data andsystem deployment. Current LDP implementations automatize only the latter.So far, to deploy data in LDPs, manual development of LDP generators are re-quired to transform existing data from their native structures into LDP resourceswhich can then be materialized in LDP stores. This development involves twomain phases: design and implementation. During the design phase, design de-cisions related to the LDP design are taken. During the implementation phase,these decisions are hardcoded in the LDP generator. There are two main issueswith using custom LDP generators. Firstly, it may be costly as it requires manualdevelopment for every different cases and also an understanding of the imple-mentation on which the LDP is being deployed. Secondly, hardcoding the designin the implementation of the LDP generator creates a tight coupling betweenthem (the design & implementation) complexifying both the maintainability andreusability of the design.

In our previous works [3, 2], we presented an approach for the generation anddeployment of LDPs from existing data sources and a proof of concept of doingso. The objective of the work presented in this paper is to decouple LDP designsfrom the implementation of LDP generators by formally defining a language toexpress LDP designs. Then we provide a generic LDP generator, which we refer toas an LDPizer, to interpret LDP-DL documents and automatize the generationand deployment of LDPs from data sources. The rest of this paper is organizedas follows: Sec. 2 provides an overview of our context and the motivation fora language, Sec. 3 describes the syntax and LDP-DL and provides a generaloverview of the language. Sec. 4 describes the semantics of the language in amodel-theoretic way. Sec. 5 provides a set of algorithms for evaluating a designdocument to general an LDP dataset, in Sec. 6, we discuss the implementationand evaluation of our approach, Sec. 7 discusses related works and finally weconclude with an outlook on future works in Sec. 8.

Page 4: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

4 Authors Suppressed Due to Excessive Length

2 Context and Motivation

The context of our work is the OpenSensingCity project1 where the aim is tofacilitate exploitation of smart city data within the open data context. We useSemantic Web standards, especially the LDP standard to homogenize access toRDF data. After providing an overview of the LDP standard, we describe ourcontext highlighting the need for LDPs and requirements in deploying them.Finally, we provide a general overview of our approach in which we discuss themotivation for our language.

2.1 Overview of the Linked Data Platform Standard

The LDP standard provides a model to organize data resources known as LDPresources and an interaction model to interact (read-write) with these resourcesvia HTTP. There are two types of LDP resources: LDP RDF Sources (LDP-RS) and LDP Non-RDF Sources. LDP-RS is an LDP resource whose state isrepresented in RDF while the state of an LDP Non-RDF source is in non-RDF.LDP resources can be organized in collections, referred to as LDP containers.An LDP container (LDPC) is an specialization of an LDP-RS which is used toorganize RDF resources. The LDP resources contained in an LDP container areknown as member resources. There are three types of LDPCs but currently, ourworks is restricted only to LDP Basic Container (LDP-BC).

The design of an LDP can include a number of aspects but in this paper,we restrict it only to the LDP resources’ IRI, their content and organizationin LDP containers. Also, For now, we only support the deployment of LDPswhere LDP resources are LDP-RSs and where LDP containers are LDP-BCs.Therefore, in the remainder of the paper, when we use the term “container” inthe context of our language, we refer to an LDP-BC, and when we use “non-container”, we refer to an LDP-RS that is not a container. Note that, amongcurrent LDP implementations (discussed in Sec. 7), most support LDP-BCs andfewer support other types of containers.

2.2 Context

Let us consider the case of a city where the governmental institution responsiblefor it, decides to deploy a data platform to expose the city data. One reason fordoing this may be to enable the development of smart city applications to aidcitizens in their activities within the city. In order to enhance interoperabilityand homogenize access, the data platform should be compliant with SemanticsWeb standards including the LDP standard.2

A city is a decentralized ecosystem where data sources come from differentorganizations. As such, the city LDP may have to exploit and aggregate datafrom these sources. There are two issues when exploiting them. First, much of

1 http://opensensingcity.emse.fr/2 In this section, requirements of our context are in italic

Page 5: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Title Suppressed Due to Excessive Length 5

the data sources are in heterogeneous data formats and thus the city LDP mustbe to open to them. Besides, there are data sources whose exploitation gives riseto constraints such as hosting constraints that prevent from hosting a copy ofthe data in a third-party environment. Hosting constraints can be on the dataitself, such as license restrictions, or it can be a limitation of the third-partysoftware environment such as bandwidth or storage limitations to continuouslyverify and maintain fresh copies of dynamic or real-time data. Thus, the cityLDP has to be able to cope with hosting constraints.

A city is a dynamic system and the city LDP itself may have to modify itsdesign to consider new types of data from existing or new data sources. Also,there may be a city wishing to reuse the LDP design of another city LDP toexpose data similarly. One potential reason for doing so may be to enhancedata homogeneity and integration to enable development of generic smart cityapplications. Such applications may exploit any city LDP as long as the LDPs useboth a design and vocabulary known by the application. Therefore, the designof the city LDP has to be reusable.

Finally, there may be organizations wishing to participate in this effort byopening their data in conformance to Semantic Web standards via LDPs. How-ever, they may be reluctant as currently, deploying data in LDPs may requiredevelopment of LDP generators and they may not want to invest much time indoing so.Thus, it is of prime importance to have some level of automatizationduring the deployment of data via LDPs.

2.3 Illustrative Example

Fig. 1 shows an illustrative example that will be used throughout this paper.It comprises of an RDF graph that uses the DCAT vocabulary [12], show inFig. 1(a). This graph is used for generating an LDP having a structure shown inFig. 1(b). In the DCAT vocabulary, data catalogs have datasets. The datasetsare associated with themes and distributions. The organization of LDPRs in theLDP in Fig. 1(b) uses somewhat a similar structure like DCAT where there arecontainers for describing catalogs and the catalog containers contains other con-tainers that describe DCAT datasets of their respective catalogs. The datasetscontainers in turn contain two containers for grouping non-containers that de-scribe their distributions and themes. The LDP has numerous LDPRs includingdex:parking and dex:pJSON whose RDF graph are shown in Fig. 1(c) andFig. 1(d). Both dex:parking and dex:pJSON describes ex:parking andex:pJSON respectively and their graph partially contains their description ob-tained from the original RDF graph in Fig. 1(a). For every resource from theRDF graph, an LDP-RS is created in the LDP to describe it and containers areused to organize the LDP-RSs.

2.4 Our Approach: The LDP Generation Workflow

In order to satisfy the requirements listed in the previous section, our approachuses principles from model-driven engineering (MDE). In summary, MDE in-

Page 6: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

6 Authors Suppressed Due to Excessive Length

ex:paris-catalog

ex:parking

ex:pJSONex:pCSV

dcat:Catalog

dcat:Dataset

dcat:Distribution

ex:transportex:mobility

dcat:Theme

dcat:distribution dcat:th

eme

ex:busStation

dcat:dataset

Instance of

ex:bsJSONex:bsXML

dcat:distribution

ex:toulouse-catalog

dex:parking a ldp:BasicContainer; ldp:contains dex:pDistributions, dex:pThemes; foaf:primaryTopic ex:parking .ex:parking a dcat:Dataset;

Dcat:keyword "parking","cars" .

dex:pJSON a ldp:RDFSource; foaf:primaryTopic ex:pJSON .ex:pJSON a dcat:Distribution ; dct:format "JSON" ; dcat:accessURL <http://example.com/data/pjson> .

(a)

(c)

(d)

dex:paris-catalog

dex:parking

dex:pCSVdex:pJSON

dex:pDistributions

http://www.w3.org/ns/ldp#contains

dex:transportdex:mobility

dex:pThemes

dex:busStation

dex:bsXMLdex:bsJSON

dex:bDistributions

dex

d1:toulouse-catalog

(b)

Fig. 1: Example of structure of an LDP with its data source and graphs

Page 7: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Title Suppressed Due to Excessive Length 7

volves using models as first-class entities and transforming them into runningsystems by using generators or by dynamically interpreting the models at run-time [9]. By separating models from the running systems, MDE enables sepa-ration of concerns thus guaranteeing higher maintainability of software systemsand reusability of systems’ models [17]. To formalize and interpret these models,MDE requires the use of a Domain Specific Language (DSL) [17], also referredas the design language, with a well-defined syntax and semantics.

LDP POST

Requests

LDPizer LDP Server

design document

LDP Dataset

LDP Dataset Deployer

Deployment Parameters

LDP-DLwritten in

LDP Dataset Server

Data sources

Fig. 2: General overview of the LDP Generation Workflow

Fig. 2 shows a general overview of our approach. The LDP generation work-flow includes two processes: LDPization and Deployment. During the LDPiza-tion process, the LDPizer consumes a design document written in LDP-DL whichserves as our DSL. The LDPizer interprets the model based on the semanticsof the language and exploits the data sources to generate what we call an LDPdataset. Basically, the LDP dataset stores LDP resources. We introduce thisstructure to abstract from implementation specific ways current LDP implemen-tations use to store LDP resources. LDP dataset is formally defined in Sec. 3and throughout this paper, its mention is confined to that definition.

During the deployment process, the LDP dataset is deployed in an LDP.This can be done in two ways based on the nature of the LDP server. First, it ispossible to have an LDP server that can directly consume and interpret the LDPDataset and exposes LDP resources. Secondly, if the LDP server accepts POSTrequests, an LDP Dataset Deployer can generate and send LDP POST requestsfor each LDP resource contained in the LDP dataset. For doing this, the LDPDataset Deployer may require some parameters such as the IRI of the LDP serveror authentication information. In our work, we consider deployment of only oneLDP server. But one can envisage to have a language to describe deploymentschemes, such as replicate, partition, etc., fed to LDP Dataset Deployer to LDPresources on several LDP servers.

Page 8: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

8 Authors Suppressed Due to Excessive Length

3 LDP Design Language

In this section, we describe LDP-DL, our LDP design language. We start with ageneral overview of its key concepts and provide its abstract syntax and formalsemantics.

3.1 Overview of the language

As mentioned in Sec. 2.1, in this paper, we restrict ourselves only to LDPs whereall containers are LDP-BCs and we exclude LDP Non-RDF Sources. From anabstract point of view, the data in such an LDP can be described as an LDPdataset, a structure where each LDP resource is assigned a URL and has anassociated RDF graph, and a set of members if it is a container. In it, pairs(url, graph) representing a non-container, formalize the fact that accessing theURL on the LDP returns the graph, whereas triples (url, graph,M) indicatesthat not only access to the URL returns the graph but the resource is an LDP-BC whose members are in M . For example, in Fig. 1(b), dex:parking is theURL of a container associated with the graph in Fig. 1(c) and having mem-bers dex:distributions and dex:themes. Furthermore, dex:pJSON isthe URL of a non-container in Fig. 1(b) with its graph in Fig. 1(d).

In a nutshell, LDP-DL provides constructs for describing the generation of anLDP dataset from existing data sources. In general, data sources may not be inRDF or may contain resources whose IRIs do not dereference to the LDP. There-fore, associated LDP-RS within the LDP namespace may have to be generated todescribe resources from the original data sources. For example, dex:parking,from Fig. 1(b), has been generated for the resource ex:parking from Fig. 1(a).ex:parking cannot be used directly as an LDPR. Doing so may violate theLDP standard with adverse effects related to the lifecycle of the LDPR as thestandard states that “a contained LDPR cannot be created (. . . ) before itscontaining LDPC exists”[ [14], §2]. This is why in LDP-DL, to expose a re-source from the data source via an LDP, a new LDP-RS is always generatedto describe it. The resource for which an LDP-RS is generated is called therelated resource. Thus, the related resource of the LDP-RS dex:parking isex:parking. Let’s note that an LDP-RS may not have a related resource.For example, dex:pDistributions from Fig. 1(b) does not have a relatedresource. This is because it is generated for describing the set of distributionsof ex:parking and since there is no explicit resources in the data source fordescribing this set itself, it does not have a related resource.

Fig. 3 shows a general overview of the constructs in LDP-DL. The languageis based on the core notions of ContainerMap, NonContainerMap, ResourceMapand DataSource. ContainerMaps and NonContainerMaps are the top-level con-structs of the language that describe the generation of containers or non-containers.Both need to describe which related resources to extract from source and thegraph of the new LDP-RSs for describing every related resource, and for thisthey use ResourceMaps. ResourceMaps specify the data sources from which toextract related resources, and what graph to generate for LDP-RS describing

Page 9: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Title Suppressed Due to Excessive Length 9

ContainerMap

NonContainerMap

ResourceMap DataSource

0..*

1..*

1..*

0..*

1..*

Fig. 3: Overview of the main constructs of LDP-DL in UML notation.

each related resource. DataSources describe access to the original data fromwhich the LDP dataset is generated. Finally, ContainerMaps may refer to otherContainerMaps in a nested way.

The generation of LDP-RSs using a map may be influenced by the hierarchyof its ContainerMap. As an example, the content of dex:pDistributionsdepends on its parent’s related resource ex:parking. In LDP-DL, a map canrefer to the related resources of parents and ancestors that are generated fromContainerMaps that contain them.

3.2 Abstract Syntax

Hereafter, we assume familiarity with the concepts of IRIs, RDF graphs, namedgraphs, query variables, query patterns, construct queries, graph template, solu-tion mappings from RDF [6] and SPARQL [10]. We assume the existence of aninfinite set D whose elements are documents and write IRI the set of all IRIs,V the set of query variables, G the set of all RDF graphs.

A design document in LDP-DL is a pair 〈CM,NM〉, where CM is a setof ContainerMaps and NM is a set of NonContainerMaps. A NonContainerMap

is a pair 〈unm,RM〉 where unm is an IRI and RM is a set of ResourceMaps.A ContainerMap is a tuple 〈ucm,RM,CM,NM〉 where ucm is an IRI, RM isa set of ResourceMaps, CM is a set of ContainerMaps, and NM is a set ofNonContainerMaps. A ResourceMap is a tuple 〈urm, qp, cq,DS〉 where urm is anIRI, qp is a SPARQL query pattern, cq is a SPARQL CONSTRUCT query, andDS is a set of DataSources. There are several ways of describing a DataSource

that our concrete language covers (see details in the language specification [4]).We only consider the cases of a pair 〈uds, uloc〉 or a triple 〈uds, uloc, ulr〉 where uds,uloc and ulr are IRIs that respectively refer to a data source, its location and itsRDF lifting rule.

As we can see, all components of a design document in LDP-DL have anIRI. Given a ∗Map or DataSource x, we refer to the IRI of x as iri(x). In aResourceMap, qp is used to extract a set of related resources from DataSources,and cq is used to generate the graph of the LDP-RSs associated with the relatedresources. In a DataSource, uloc corresponds to the location of the source file,whereas ulr is the location of what we call a lifting rule, used to generate an RDFgraph from non-RDF data.

Page 10: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

10 Authors Suppressed Due to Excessive Length

We assume the existence of an infinite set of variables Vr = ρ, ν, π1, . . . , πi, . . . ⊆V called the reserved variables, such that V \ Vr is infinite. ResourceMaps mayuse the reserved variables but these have a special semantics as explained in thenext section. However, due to undesirable consequences, we forbid the use ofvariable ν in the WHERE clause of the CONSTRUCT query cq.

Fig. 4 shows a simple example of a design document3 in the abstract syntaxof the language. An arrow with the label cm, nm or rm indicates that theconstruct has a ContainerMap, NonContainerMap or ResourceMap in its CM,NM or RM respectively. Also, though not shown in the figure, in the DS ofall ResourceMaps, there is a DataSource (ex:ds,ex:paris) which is actuallythe RDF graph in Fig. 1(a).

:catalog

:dataset

:distrib

:rm1

:rm2

:rm4

ContainerMap NonContainerMap ResourceMap Query pattern Construct Query

:distribs :rm3

:themes

cm

rm

cm

nm

cm

rm

rm

rm

qp

qp

qp

qp

cq

cq

cq

CONSTRUCT dct:description ?o WHERE π1 dct:title ?title . BIND(CONCAT("Describes distribution of ",?title))

CONSTRUCT foaf:primaryTopic . ?p ?o . WHERE ?p ?o .

FILTER (?p not in (dcat:dataset))

a dcat:Catalog .

π1 dcat:dataset .

CONSTRUCT foaf:primaryTopic . ?p ?o . WHERE ?p ?o .

FILTER (?p not in (dcat:distribution))

VALUES UNDEF .

cq

π2 dcat:distribution .

CONSTRUCT foaf:primaryTopic . ?p ?o . WHERE ?p ?o .

Fig. 4: Example of an LDP-DL document in the abstract syntax.

3 Design document in concrete syntax https://tinyurl.com/y8n9cls2

Page 11: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Title Suppressed Due to Excessive Length 11

4 Formal Semantics

The aim of the formal semantics is to associate an LDP dataset (as described inSec. 3.1) to a design document. To this end, we define a notion of interpretationand a notion of satisfaction in a model-theoretic way. Several interpretationsmay satisfy a given design document, leading to different evaluations of it. Thisapproach allows developers to implement the language in different ways, leadingto different results, depending on whether they implement optional or alternativeparts of the standard. Also, in the future, we would like to study formal propertiessuch as entailment, model checking, static analysis, etc. of design documents.

4.1 LDP-DL interpretation.

An LDP-DL interpretation determines which IRIs denote ContainerMaps, NonContainerMaps,ResourceMaps, DataSources, or something else. Then, each ContainerMap (resp.NonContainerMap) is interpreted as a set of triples (url, graph,M) (resp., aset of pairs (url, graph)) wrt a list of ancestors. A list of ancestors is a finitesequence of elements that can be IRIs or a special value ε 6∈ IRI that indi-cates an absence of related resource. Formally, an ancestor list is an element ofIRI∗ =

⋃n>=0

(IRI ∪ ε)n and ∅ being the empty list (IRI ∪ ε)0. We use the

notation #»p to denote an ancestor list and use len( #»p ) to denote the length of thelist. Also #»p :: r denotes appending element r to #»p .

Definition 1 (LDP-DL Interpretation). An LDP-DL interpretation I is atuple 〈∆I , C,N ,R,S, ·I , IC , IN , IR, IS〉 such that:

– ∆I is a non empty set (the domain of interpretation);– C, N , R, S are subsets of ∆I ;– ·I : IRI→ ∆I is the interpretation function;

– IC : C × IRI∗ → 2IRI×G×2IRI

;– IN : N × IRI∗ → 2IRI×G;– IR : R × IRI∗ → 2IRI×IRI∪ε×G such that (n, r1, g1) ∈ IR(u1,

#»p1) ∧(n, r2, g2) ∈ IR(u2,

#»p2) =⇒ r1 = r2 ∧ g1 = g2 (unicity constraint);– IS : S → G.

C (resp. N , R, S) represents the container maps (resp., non container maps,resource maps, data sources) according to the interpretation. That is, if theinterpretation function I maps an IRI to an element of C, it means that thisinterpretation considers that the IRI is the name of a container map. For a givenContainerMap cm ∈ C and an ancestor list #»p , 〈n, g,M〉 ∈ IC(cm, #»p ) meansthat, in the context of #»p , cm must map the data sources to LDP-BCs where nis the IRI of a container, g is the RDF graph obtained from dereferencing n, andM is the set of IRIs referring to the members of the container. Similarly, for aNonContainerMap nm ∈ N , 〈n, g〉 ∈ IN (nm, #»p ) means that nm must map toresources where n is the IRI of a non-container LDP-RS that provides g whendereferenced. For a DataSource ds ∈ S, IS(ds) is an RDF graph representingwhat can be obtained from the data source.

Page 12: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

12 Authors Suppressed Due to Excessive Length

4.2 Satisfaction

In this section, we describe satisfaction |= relating interpretations to syntacticconstructs that they validate.

Informal description of the satisfaction.

We describe satisfaction |= relating interpretations to syntactic constructs thatthey validate. This section provides an overview of the definitions by consideringa concrete example. In the next section, we provide the full formal semanticswith more detailed explanations. The rest of this section informally explains thesemantics of LDP-DL constructs. To do so, we use Fig. 4, which is a design inLDP-DL for building the LDP4 having the structure shown in Fig. 1(b) usingthe data source in Fig. 1(a).

In principle, a DataSource provides information to retrieve an RDF graph,using parameters that can take several forms. Here, we define only two formsof DataSources, ds = 〈uds, uloc〉 that provides a URL to an RDF documentdirectly, and ds = 〈uds, uloc, ulr〉 that provides an additional URL to an arbitrarydocument with a transformation script to generate an RDF graph. We call sucha script a lifting rule and can be seen as a function lr : D → G. Our semantics isflexible enough to be extended to more complex such as for access rights, contentnegotiation etc. For example, the retrieval of the RDF graph in Fig. 1(a) could bedescribed by a DataSource ds1 = 〈uds1 , uloc1〉 where IS(uIds1) is the RDF graphlocated at uloc1 . Similarly, for a DataSource ds2 = 〈uds2 , uloc2 , ulr2〉, IS(uIds1)is the RDF graph obtained by executing the lifting rule found at ulr2 on thedocument found at uloc2 .

At the top level of the design document, the ContainerMap :catalog usesthe ResourceMap :rm1 to generate the top level containers. The DataSource

used by :rm1 is interpreted as the RDF graph of Fig. 1(a). At this level, :rm1is evaluated with an empty the ancestor list. Using its query pattern, related re-sources extracted from the source are DCAT catalog. They are ex:paris-catalogand ex:toulouse-catalog. For each of them, an IRI is generated. In thiscase, the new IRIs are dex:paris-catalog and dex:toulouse-catalog.Also, to satisfy the map :rm1, the RDF graph associated with the containerIRI is obtained using its CONSTRUCT query, where the variable ρ is bound tothe related resource IRI, and ν to the IRI of the new LDPR. For example, whendoing so for dex:paris-catalog, ρ is bound to ex:paris-catalog, andν to dex:paris-catalog. Finally, new containers generated from :catalogmust define their members as well, and is thus satisfied only if its memberscorrespond to the resources generated by their underlying ContainerMaps andContainerMaps (in this case, :dataset only)

The ContainerMap :dataset is used to generate members for containersgenerated from :catalog. Let us consider the case for dex:paris-catalog.Its related resource is ex:paris-catalog and its members must only haverelated resources that are DCAT dataset from this catalog. This is why the

4 http://opensensingcity.emse.fr/ldpdfend/catalogs/ldp

Page 13: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Title Suppressed Due to Excessive Length 13

extraction of these resources is parameterized by the parent variable π1 in thequery pattern :rm2. π1 is binded to the first element of the ancestor list which atthis stage is (ex:paris-catalog). The evaluation of :dataset generates two con-tainers, dex:parking and dex:busStation, that are added to the membersof dex:paris-catalog.

The map :distribs is used to generate members for containers generatedby :dataset. Consider the case of doing so for dex:parking whose relatedresource is ex:parking. In this context, the aim of using :distribs is to gen-erate a container to describe the set of distrbutions of ex:parking. Note thatin the data source, there is no explicit resource to describe this set. This is why, inthe ResourceMap :rm3, a query pattern that always returns a single result whereρ is unbound. Although, the query pattern does not uses any ancestor variables,it is evaluated using the ancestor list (ex:paris-catalog,ex:parking) andthus ancestor variables π1 and π2 are binded. The evaluation of :distribs inthe context of dex:parking generates a single container dex:pDistributions.:distribs is satisfied when a single container is generated without a relatedresource.

Finally, the NonContainerMap :distrib is used to generate non-containersfor each distribution of a DCAT dataset. Consider the case of doing so fordex:pDistributions with ancestor list (ex:paris-catalog,ex:parking,∅).In this the context, the proper related resource that must be used to extractthe relevant distributions is associated with the grand parent container. Thisis why the query pattern of :rm4 uses π2, binded to ex:parking, ratherthan π1. Using the result (ex:pJSON and ex:pCSV) of this query pattern,two non-containers dex:pJSON and dex:pCSV in dex:pDistributions aregenerated using :distrib. In general, any ancestor related resources can bereferenced through the ancestor variables πi simultaneously, even when they areunbound.

Satisfaction of a DataSource

In principle, a DataSource provides information to retrieve an RDF graph, us-ing parameters that can take several forms. Here, we define only two forms ofDataSources, ds = 〈uds, uloc〉 that provides a URL to an RDF document directly,and ds = 〈uds, uloc, ulr〉 that provides a URL to an arbitrary document with atransformation script to generate an RDF graph. We call such a script a liftingrule, which can be seen as a function lr : D → G.

Our semantics is flexible enough to be extended to more complex forms in-cluding access right management, content negociation, etc.

A DataSource ds is satisfied if, by using the parameters of ds, we obtain theRDF graph of the interpretation of iri(ds).

Formally, satisfaction of DataSources is parameterized by a function deref :IRI→ D and a lifting rule map lr : IRI 7→ (D → G), a partial function assigninga lifting rule to some IRIs.

Page 14: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

14 Authors Suppressed Due to Excessive Length

Definition 2. Let I be a LDP-DL interpretation. We say that I satisfies ds =〈uds, uloc, ulr〉 wrt a deref function and a lifting rule map lr, written I |=deref,lr dsiff IS(uIds) = lr(deref(ulr))(deref(uloc)). If ds = 〈uds, uloc〉, then interp satisfies dswrt deref iff IS(uIds) = deref(uloc). If DS is a set of DataSources, then we writeI |= DS when for all ds ∈ DS, I |= ds.

When there is no ambiguity or no need to specify deref and lr, we simplywrite I |= ds.

Satisfaction of a ResourceMap

The satisfaction of a ResourceMap 〈urm, qp, cq,DS〉 is defined wrt an ancestorlist, and depends on the results of the query execution of qp and cq over thegraphs retrieved using DS. To define this properly, we need to introduce addi-tional notations.

In this section, we use the notions of solution mappings and query evaluation,as in [13]. We write dom(µ) to denote the domain of solution mapping µ and[[Q]]G to denote the evaluation of query pattern Q over the graph G. Given agraph template gt and a set of solution mappings Ω, we write gt(Ω) to denotethe RDF graph formed by taking each query solution in Ω, substituting for thevariables in the graph template, and combining the triples into a single RDFgraph by set union. If G is a set of RDF graphs, Merge(G) is the RDF mergeof the graphs in G.

For a given ancestor list #»p , let µ #»p be the mapping s.t. dom(µ #»p ) ⊆ πi |1 ≤ i ≤ len( #»p ) and for 1 ≤ i ≤ len( #»p ), µ #»p (πi) = #»p [i] if #»p [i] 6= ε and µ #»p (πi)undefined otherwise. This mapping can be used to constraints the selection ofrelated resources and the construction of the associated graphs. We call µ #»p theancestor mapping and the variables πi the ancestor variables.

Definition 3. Let I be a LDP-DL interpretation, #»p an ancestor list, rm =〈urm, qp, cq,DS〉 be a ResourceMap, with cq = 〈gtcq, qpcq〉. Let gs = MergeIS(iri(ds)I) | ds ∈ DS.We say that I satisfies rm wrt #»p , written I, #»p |= rm, iff:

– I |= DS;

– if r ∈ Πρ([[qp]]gs on µ #»p ) then there exists 〈n, r, g〉 ∈ IR(uIrm,#»

( p));– if 〈n, r, g〉 ∈ IR(uIrm,

#»p ) then:

• r ∈ Πρ([[qp]]gs on µ #»p ),• g ⊇ gtcq([[qpcq]]gs) on µν on µρ on µ #»p ;

where:

• let µν be the mapping where dom(µν) = ν and µν(ν) = n,

• let µρ be the mapping where dom(µρ) = ρ and µρ(ρ) = r if r 6= ε andµρ(ρ) undefined otherwise.

If RM is a set of ResourceMaps, then we write I, #»p |= RM when for all rm ∈RM, I, #»p |= rm.

Page 15: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Title Suppressed Due to Excessive Length 15

The query pattern qp describes which related resources to extract from thedata sources, using variable ρ to keep those resources. This is why we have aprojection on ρ in the second item of the definition. Further constraints can beadded by using the ancestor variables. The purpose of ν is to have a way torefer to a newly created resource when describing the constructed graph. Whenevaluating the cq, the variable ν is bounded in µν which is joined with theevaluation of the query pattern in cq.

For instance, in Fig. 4, rm3 has a query pattern that selects the distributionsof datasets from the DCAT catalog.

Satisfaction of a NonContainerMap

An interpretation satisfies a NonContainerMap 〈unm,RM〉 wrt an ancestor listwhen it satisfies all rm ∈ RM, and the set of named graphs associated with unmtallies with the interpretations of the ResourceMaps.

Definition 4. Let I be a LDP-DL interpretation, #»p an ancestor list, nm =〈unm,RM〉 be a NonContainerMap.

We say that I satisfies nm wrt #»p , written I, #»p |= nm, iff:

– I, #»p |= RM;– if 〈n, g〉 ∈ IN (uInm) then there exists rm ∈ RM, 〈n, r, g′〉 ∈ IR(iri(rm)I , #»p )

such that g′ ⊆ g;– if there exists rm ∈ RM, 〈n, r, g′〉 ∈ IR(iri(rm)I , #»p ), then there exists a

unique 〈n, g〉 ∈ IN (uInm) such that g′ ⊆ g.

If NM is a set of NonContainerMap, then we write I, #»p |= NM when for allnm ∈ NM, I, #»p |= nm.

Satisfaction of a ContainerMap

As previously, the satisfaction of a ContainerMap 〈ucm,RM,CM,NM〉 is de-fined wrt an ancestor list. Constraints similar to those of NonContainerMapsapply. In addition, the maps in CM and NM must be satisfied wrt a new an-cestor list that is the previous list appended with the related resources of theContainerMap.

Additionally, to satisfy a ContainerMap, the set of members M in a triple〈n, g,M〉 must contain the resources that must be generated by the maps of CMand NM wrt the new ancestor list.

Definition 5. Let I be a LDP-DL interpretation, #»p an ancestor list, cm =〈ucm,RM,CM,NM〉 be a ContainerMap.

We say that I satisfies cm wrt #»p , written I, #»p |= nm, iff:

– I, #»p |= RM;– if 〈n, g,M〉 ∈ IC(uIcm) then there exists rm ∈ RM, 〈n, r, g′〉 ∈ IR(iri(rm)I , #»p )

such that g′ ⊆ g;

Page 16: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

16 Authors Suppressed Due to Excessive Length

– if there exists rm ∈ RM, 〈n, r, g′〉 ∈ IR(iri(rm)I , #»p ), then there exists aunique 〈n, g,M〉 ∈ IC(uIcm) such that g′ ⊆ g;

– for all rm ∈ RM and for all 〈n, r, g〉 ∈ IR(iri(rm)I):

• I, #»p :: r |= NM and M ⊇ n | ∃nm ∈ NM ∧ ∃〈n, g〉 ∈ IN (iri(nmI , #»p ::r));• I, #»p :: r |= CM and M ⊇ n | ∃cm ∈ CM∧∃〈n, g,M ′〉 ∈ IC(iri(rmI , #»p ::r)).

If CM is a set of ContainerMap, then we write I, #»p |= CM when for all cm ∈CM, I, #»p |= cm.

Satisfaction of an LDP-DL document

A LDP-DL document is satisfied if all its top level ContainerMaps and NonContainerMapsare satisfied wrt an empty ancestor list.

Definition 6. Let I be a LDP-DL interpretation, δ = 〈CM,NM〉 be a LDP-DL document.

We say that I satisfies δ , written I |= δ, iff (1) I, ∅ |= CM and (2)

I, ∅ |= NM.

4.3 Evaluation of a design document using an interpreta-tion.

With an interpretation, we have a way of assigning an LDP dataset (as describedin Sec. 3.1) to a design document, using the interpretations of the ContainerMapsand NonContainerMaps that appear in the document. We call this an evaluationof the document. Formally, it takes the form of a function that builds an LDPdataset given an LDP-DL interpretation and a document δ. We formalize thenotion of LDP dataset as follows:

Definition 7 (LDP dataset). An LDP dataset is a pair 〈NG,NC〉 whereNG is a set of named graphs and NC is a set of named container, that is a setof triples 〈n, g,M〉 such that n ∈ IRI (called the container name), g ∈ G andM ∈ 2IRI, and such that:

– no IRI appears more than once as a (graph or container) name;

– for all 〈n, g,M〉 ∈ NC, and for all u ∈ M , there exists a named graph orcontainer having the name u.

Having the notion of LDP dataset, the maps of the design document is eval-uated wrt a ancestor list as follows:

Definition 8 (Evaluation of a map). The evaluation of a ContainerMap orNonContainerMap m wrt an interpretation I and an ancestor list #»p s.t. I, #»p |=

Page 17: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Title Suppressed Due to Excessive Length 17

m is:

[[m]]#»pI =

IN (iri(m)I , #»p ), if m is a NonContainerMap

IC(iri(m)I , #»p ) ∪⋃

rm∈RM〈n,r,g〉∈IR(iri(rm)I , #»p )

m′∈NM∪CM

[[m′]]#»p ::rI , if m = 〈ucm,RM,CM,NM〉 is a ContainerMap

The evaluation of a map yields an LDP dataset. Indeed, the first condition ofDef. 7 is satisfied because of the unicity constraint from Def. 1, and the secondcondition is satisfied because I, #»p |= m. Now we can define the evaluation of adesign document wrt an interpretation:

Definition 9 (Evaluation of an design document). Let I be an interpre-tation and δ = 〈CM,NM〉 a design document. The evaluation of δ wrt I is

[[δ]]I =⋃

m∈CM∪NM

[[m]]∅I

In practice, an LDP-DL processor will not define an explicit interpretation,but will build an LDP dataset from a design document. Hence, we want todefine a notion of conformity of an algorithm wrt the language specificationgiven above. To this aim, we first provide a definition of a valid LDP dataset fora design document.

Definition 10 (Valid). An LDP dataset D is valid wrt an design document δif there exists an interpretation I that satisfies δ, such that [[δ]]I = D.

The validity of LDP dataset is an important property used in our implementationwhen generating and deploying LDPs. Finally, we can define the correctness ofan algorithm that implements LDP-DL.

Definition 11 (Correct). An algorithm that takes an LDP-DL document asinput and returns an LDP dataset is correct if for all document δ, the output ofthe algorithm on δ is valid wrt δ.

Definition 12 (LDP dataset isomorphism). Let Λ1 = 〈NG1,NC1〉 andΛ2 = 〈NG2,NC2〉 be two LDP datasets. We say that Λ1 and Λ2 are iso-morphic (written Λ1

∼= Λ2) iff there exists a bijection f : IRI → IRI suchthat for all 〈n, g〉 ∈ NG1, 〈f(n), f(g)〉 ∈ NG2 and for all 〈n, g,M〉 ∈ NC1,〈f(n), f(g), f(i) | i ∈ M〉 ∈ NC2, where f(g) denotes the graph built from gby replacing its IRIs with their image through f .

If an LDP dataset satisfies a design document δ, not all of its isomorphicLDP datasets satisfy δ. However, if a non-empty dataset is valid for δ, therealways exist isomorphic datasets that are equally valid for δ. This assertion is animportant aspect in our implementation for the generating LDP datasets anddeploying LDPs from them.

Page 18: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

18 Authors Suppressed Due to Excessive Length

Definition 13 (Subdataset). Let Λ1 = 〈NG1,NC1〉 and Λ2 = 〈NG2,NC2〉be two LDP datasets. We say that Λ1 is a subdataset of Λ2 (written Λ1 ≺ Λ2) iffthere exists a bijection f : NG1∪NC1 → NG2∪NC2 such that for all 〈n, g〉 ∈NG1, f(n, g) = 〈n, g′〉 and for all 〈n, g,M〉 ∈ NC1, f(n, g,M) = 〈n, g′,M〉with g ⊆ g′.

Definition 14 (Minimal LDP dataset). An LDP dataset Λmin is minimalfor a design document δ iff (1) Λmin is valid for δ, and (2) for all LDP datasetΛ that is valid for δ, there exists a subdataset Λ′ ≺ Λ that satisfies δ and suchthat Λ′ ∼= Λmin.

Theorem 1. For all design document δ, there exists a minimal LDP datasetfor δ.

Proof. TODO

Theorem 2. Let Λmin be a minimal LDP dataset for δ. For all LDP dataset Λsuch that Λmin ≺ Λ, Λ is valid for δ.

Proof. TODO

Page 19: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Title Suppressed Due to Excessive Length 19

5 Algorithms

In this section, we provide and describe the algorithms for evaluating a designdocument and its sub-parts. We describe the evaluation in a bottom-up fashionstarting with DataSources and continuing with ResourceMaps, ContainerMapsand finally NonContainerMaps. The evaluation is stateful with the global statebeing the LDP dataset Σ. The LDP dataset generated by the evaluation func-tions is a minimal LDP dataset (Def. 13). We assume the existence of the fol-lowing functions when defining the evaluation functions:

– DS(x) : returns the set of DataSources DS from the structure x– RM(x) : returns the set of ResourceMaps RM from the structure x– CM(x) : returns the set of ContainerMaps CM from the structure x– NM(x) : returns the set of NonContainerMaps NM from the structure x– qp(x) : returns the query pattern qp from the structure x– gp(x) : returns the graph pattern gt from the structure x– cq(rm) : returns the CONSTRUCT query cq of the ResourceMap rm– AddMember(Σ, contIRI, n): Add n to the set of members of the container

with IRI contIRI in the LDP dataset Σ

5.1 Evaluation of a DataSource

The evaluation of a DataSource is given by the function [[.]]source s.t.

[[ds]]source =

deref(uloc), if ds=(uds,uloc)

deref(ulr)(deref(uloc)), if ds=(uds,uloc,ulr)

As mentioned before, we restrict the definition of a DataSource only to twoforms and therefore, we define the evaluation only for these two forms. Abusivenotation, we define the evaluation of a set of DataSources DS as

[[DS]]source =⊎

ds∈DS

[[ds]]source

where g1 ] g2 denotes the merge of RDF graphs g1 and g2 with blank nodesstandardized apart.

5.2 Evaluation of a ResourceMap

Algo. 1 shows the evaluation function of a ResourceMap. It takes 4 parameterswhere rm is the ResourceMap being evaluated, cont is the IRI of the containerwhose members are being generated from rm, ancestors is the ancestor list andfinally Σ is the LDP dataset. When evaluating the ResourceMaps of maps foundat the top of the design document, cont is ∅ and ancestors is empty.

Initially, the DataSources of rm are evaluated as described in Sec. 5.1. Then,the set of related resources is extracted (Line. 5). Since this extraction can be

Page 20: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

20 Authors Suppressed Due to Excessive Length

Algorithm 1 Evaluation of a ResourceMap rm

1: procedure Evalrm(rm, cont, ancestors,Σ)2: // assuming ancestors = (p1, .., pm),3: // we define µ #»p as ∀i ∈ (1..m), µ #»p (πi) = pi4: gs ← [[DS(rm)]]source5: Ω ← Πρ([[qp(rm)]]gs on µ #»p )6: genN ← ∅ . structure to hold new resources generated from rm7: for all µ ∈ Ω do8: n← geniri(cont, µ(ρ), Σ) . generate IRI of new resource9: (qpcq, gtcq)← cq(rm) . get cq of rm

10: g ← gtcq([[qpcq]]gs) on ν ← n on µ #»p . generate graph of new resource11: genN ← genid ∪ (new, µ(ρ), g)12: end for13: return genN14: end procedure

parametrized by ancestor resources, bindings of ancestor variables in µ #»p , gen-erated using ancestors, are supplied. On obtaining the related resources in Ω,for each of them a new resource is generated together with its IRI and graph.The IRI is generated using the function geniri(Line. 8). Though, three parame-ters are supplied to this geniri, its definition is arbitrary and dependent on theimplementation. To generate the graph of the new resource, the CONSTRUCTquery cq of rm is evaluated on gs. Since the query may use ancestor variablesand in the generated graph, reference can be made to new resource itself, bind-ings for ancestor variables and new resource IRI are passed to the evaluationof cq via µ #»p and ν. Finally for all new resources generated, their IRI, relatedresource and graph are returned.

5.3 Evaluation of a NonContainerMap

The evaluation of a NonContainerMap is parametrized by a container cont andancestor list ancestors as non-containers may be generated as members of contand their related resources may depend on ancestors.

Page 21: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Title Suppressed Due to Excessive Length 21

Algorithm 2 Evaluation of Non-ContainerMap NM

1: procedure Evalnm(nm, cont, ancestors,Σ)2: genN ← ∅ . structure to hold new resources generated from RM of nm3: for all rm ∈ RM(nm) do4: genN ← genN ∪ Evalrm(rm, cont, ancestors,Σ)5: end for6: for all (n, r, g) ∈ genN do7: Σ ← (n, g) ∪Σ . Add non-container to LDP dataset8: if cont 6= ∅ then9: AddMember(Σ, cont, n) . Add member to container in LDP dataset

10: end if11: end for12: end procedure

When evaluating a NonContainerMap nm, initially, all its ResourceMaps areevaluated to generated a set of triples (n, r, g) where n is the IRI of new LDPresource, r its related resource and g its graph (Line. 3 - 5). Using this set, allnew LDP resources are typed as non-containers and added to the LDP dataset(Line. 3). Also, their IRIs are added to the members of the container cont if itnot null.

5.4 Evaluation of a ContainerMap

Like NonContainerMaps, the evaluation of ContainerMaps is parametrized by acontainer cont and ancestor list ancestors as containers may be generated asmembers of cont and containers’ related resources may depend on ancestors.

Page 22: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

22 Authors Suppressed Due to Excessive Length

Algorithm 3 Evaluation of ContainerMap NM

1: procedure Evalcm(cm, cont, ancestors,Σ)2: for all rm ∈ RM(cm) do3: genN ← genN ∪ Evalrm(rm, cont, ancestors,Σ)4: end for5: for all (n, r, g) ∈ genN do6: ancestors.append(r)7: Σ ← Σ ∪ (n, g,∅) . Add new container to LDP dataset8: if cont 6= ∅ then9: AddMember(Σ, cont, n) . Add member to container in LDP dataset

10: end if11: for all nm ∈ NM(cm) do . Evaluate non-containers of cm12: Evalnm(nm,n, ancestors,Σ)13: end for14: for all cm′ ∈ CM(cm) do . Evaluate containers of cm15: Evalcm(cm′, n, ancestors,Σ)16: end for17: end for18: end procedure

When evaluating a ContainerMap cm, initially, all its ResourceMaps are eval-uated to generated a set of triples (n, r, g) where n is the IRI of new LDP re-source, r its related resource and g its graph (Line. 3 - 5). For every descriptionof new LDP resources obtained from the evaluation of cm’s ResourceMaps, first,the related resource r is appended to the list of ancestors (Line. 6). Then, thenew resource is typed as a container and added to the LDP dataset (Line. 7).After that, if cont is not null, the new container is added to its set of members(Line. 9). Finally, all NonContainerMaps and ContainerMaps of cm are evalu-ated (Line. 11 - 16).

5.5 Evaluation of a Design Document

The evaluation of a design document is simple. Initially, the LDP dataset isinitialized to ∅ and an empty ancestor list is created. The LDP dataset Σ isthe global state which is modified when sub-parts of the design document isevaluated. Then all top ContainerMaps and NonContainerMaps of the designdocument are evaluated with respect to an NULL container and empty ancestorlist.

Page 23: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Title Suppressed Due to Excessive Length 23

Algorithm 4 Evaluation of design document

1: procedure Evalδ(δ)2: Σ ← ∅ . Initialize LDP Dataset, See also Def. 73: ancestors← () . Initialize Ancestor List4: for all cm ∈ CM(δ) do5: Evalcm(cm,∅, ancestors,Σ)6: end for7: for all nm ∈ NM(δ) do8: Evalnm(nm,∅, ancestors,Σ)9: end for

10: return Σ11: end procedure

Page 24: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

24 Authors Suppressed Due to Excessive Length

6 Implementation and Evaluation

In this section, we describe our implementation that consists of tools to instan-tiate the LDP generation workflow described in Fig. 2. Fig. 5 shows a generaloverview of them, all of which are open source and available on GitHub. Then,we evaluate our approach with respect to requirements from our context outlinedin Sec. 2.2 by performing experiments using our tools to deploy real datasets onLDPs. A detailed description of the tools and the experiments we conducted isavailable on GitHub5.

LDP POST Requests

ShapeLDP LDP Servers

design document

Static LDP Dataset

POSTerLDP

Deployment Parameters

LDP Design Language

written in

SPARQL Generate

Dynamic LDP Dataset

HubbleLDP

InterLDPData

sources

Fig. 5: Implementation of the LDP Generation Workflow

6.1 Implementation

To interpret a design document and generate the LDP dataset, we provide an LD-Pizer, ShapeLDP6. The algorithms used in it are described in Sec. 5. ShapeLDPconsumes design documents written in LDP-DL whose concrete RDF syntax isdescribed in its specification [4]. To enhance modularity, the design model maybe split and written in different documents. Then, URLs of design documentscan be fed to ShapeLDP which it combines into a single design model before in-terpreting it. To exploit heterogeneous data sources, ShapeLDP uses the liftingrules specified for DataSources in the design model. For now, only lifting ruleswritten in SPARQL-Generate [11] are supported. Future versions may considerother languages such as RML [7], XSPARQL [1] and others.

ShapeLDP evaluates a design document in two different modes: static anddynamic. In our context, we use the former for static data sources and the latterfor dynamic sources or sources having hosting constraints. In both modes, theoutput generated uses relative IRIs without an explicit base and act only asan intermediary document from which the LDP dataset should be constructed.In static evaluation (resp. dynamic evaluation), ShapeLDP generates a static

5 https://github.com/noorbakerally/LDPDatasetExamples6 https://github.com/noorbakerally/ShapeLDP

Page 25: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Title Suppressed Due to Excessive Length 25

LDP dataset (resp. dynamic LDP dataset) in TriG format [5]. The LDP datasetis implied from a static LDP dataset wrt to an explicit base IRI. Also, theLDP dataset stays valid as long as the evaluation of the query patterns andCONSTRUCT queries of all ResourceMaps from the design document do notchange. The LDP dataset implied from a dynamic LDP dataset wrt to an ex-plicit base IRI stays valid as long as the evaluation of the CONSTRUCT queryof all ResourceMaps do not change. A static LDP dataset is very much similarto an LDP dataset except that it uses relative IRIs for LDP-RS without an ex-plicit base. As for dynamic LDP dataset, is a structure somewhat similar to theLDP dataset that store all containers and non-containers and a CONSTRUCTquery to generate their RDF graphs. This CONSTRUCT query is obtained whenevaluating ResourceMaps. It is compiled using the bindings of the reserved vari-ables and serialized in SPARQL syntax. The formal definition of a dynamic LDPdataset is given below.

Definition 15 (Dynamic LDP dataset). An dynamic LDP dataset is a pair〈dNC,dC〉 where dNC is a set of dynamic non-container and dC is a setof dynamic container. A dynamic non-container is a pair 〈n, crm〉 such thatn ∈ IRI (called the non-container name) and crm is a CompiledResourceMap

that is a pair (cq,DS) such that cq is a SPARQL CONSTRUCT query and DSis a set of DataSources. A dynamic container is a triple (cq, crm,M) such thatcq is a SPARQL CONSTRUCT query and crm is a CompiledResourceMap andM ∈ 2IRI and:

– no IRI appears more than once as a (dynamic non-container or container)name;

– for all 〈n, crm,M〉 ∈ dNC, and for all u ∈ M , there exists a named con-tainer or non-container having the name u.

The LDP dataset is generated from a dynamic LDP dataset by generating theRDF graph of all containers and non-containers in it using their CONSTRUCTqueries.

In addition, we provide an LDP server, InterLDP7, which can directly con-sume an LDP dataset (static or dynamic) and expose LDP resources from it. Asof now, it only supports HTTP GET, HEAD and OPTIONS requests LDP-RSs.InterLDP passes test cases, corresponding to these LDP interactions, from thetest suite8 for W3C Linked Data Platform 1.0. The test cases are detailed in itsexecution report9 generated by the test suite. To cater for hosting constraints,InterLDP uses the dynamic LDP dataset to generate the RDF graph of therequested LDP-RS at query time.

Moreover, we provide the implementation of an LDP deployer, POSTerLDP10.It consumes an LDP dataset (static or dynamic) and deployment parameters:

7 https://github.com/noorbakerally/InterLDP8 https://w3c.github.io/ldp-testsuite/ accessed on 25 November 20179 https://w3id.org/ldpdl/InterLDP/execution-report.html

10 https://github.com/noorbakerally/POSTerLDP

Page 26: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

26 Authors Suppressed Due to Excessive Length

base URL of LDP server and optionally the username and password for basicauthentication on the server. Currently, POSTerLDP deploys LDP-RSs on oneserver only but future versions may consider replication or partitioning schemesdescribed in a particular deployment language. POSTerLDP is independent ofa particular LDP implementations. It generates standard LDP requests andthus, it is compatible with any LDP server implementing these interactions.Finally, we provide an LDP browser, HubbleLDP11 which can be used to browseLDPs to view LDP resources and their content. An instance of it is running athttp://bit.ly/2BGYl9X.

6.2 Evaluation

All the evaluation scenarios consist in deploying real datasets via LDPs to inves-tigate different features relating to our context requirements. In our experiments,we uses RDF datasets structured per the DCAT vocabulary due to its impor-tance for open data and city data portals. We obtained 22 DCAT datasets fromcity data portals. They are deployed along 5 different design document by usingShapeLDP to generate their LDP datasets and InterLDP to deploy the LDPdatasets as LDPs (110 in all). Moreover, to show that our approach is compati-ble with existing LDP servers, we use POSTerLDP to deploy 2 LDPs over LDPservers that are instances of Apache Marmotta and Gold, both of them beingreference implementation of the LDP standard. This first result demonstratesthe compatibility of our approach with LDP standards: ShapeLDP has beenused to generate LDP datasets that have been directly deployed via InterLDP,and POSTerLDP has been used to deploy LDP datasets on instances of existingLDP servers.

Handling heterogeneity of data portals has been demonstrated by deploying2 datasets, in JSON and CSV formats, via an LDP. In the design document,the original data source is specified together with a lifting rule. Using SPARQL-Generate, the RDF data is generated and used by ShapeLDP to generate theLDP dataset which is finally deployed as an LDP using InterLDP.

Then, we use a dataset which is being updated on a real-time basis and deployit via an LDP to show that our approach can cope with hosting constraints. Usingdynamic evaluation in ShapeLDP, the dynamic LDP dataset is generated andused by InterLDP to expose the LDP. Generating response for LDP-RSs takesmore time because their content are generated at the query time using real-timedata from the source.

Finally, one of our requirements was to facilitate the deployment of LDPs.So far, in the LDP generation workflow, the only manual part is writing thedesign document. This can also be tackled using generic designs that we havealready provided. Along 2 generic design documents, LDP datasets are gener-ated using ShapeLDP for the DCAT datasets and are exposed as LDPs usingInterLDP. Currently, the generic designs can only be used on RDF graphs thatuse RDFS/OWL vocabularies and do not have cycles in the class hierarchy.

11 https://github.com/noorbakerally/HubbleLDP

Page 27: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Title Suppressed Due to Excessive Length 27

We also did a simple performance test for ShapeLDP on data sources ofdifferent sizes using simple design documents. Fig. 6 shows the result for thistest which was done on a machine having 16 GB RAM and an i7 processorof 2.3 GHz. Implementations does not necessarily highlight the true qualitiesof the approaches since optimizations, better choices of software libraries, andso on, could dramatically impact the results. Yet, it provides an overview ofShapeLDP’s runtime behaviour.

Number of triples

Tim

e (m

illis

econ

ds)

Fig. 6: Performance of ShapeLDP on RDF graphs

Page 28: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

28 Authors Suppressed Due to Excessive Length

7 Related Work

We restrict our analysis of LDP implementations to those mentioned in LDPimplementation conformance report12 which show their degree of conformanceto the LDP standard. We categorize them into LDP resource management sys-tem (Callimachus13, Carbon LDP14, Fedora Commons15, Apache Marmotta16,Virtuoso17, Gold18, rww-play19,LDP.js20) and LDP framework (Eclipse Lyo21,LDP4j [8]). LDPR management systems can be seen as a repository for LDPresources on top of which CRUD operations conforming to the LDP standardare allowed through HTTP methods. LDP frameworks are solutions which canbe used to build custom applications which implement LDP interactions. Cur-rent LDP implementations are in their early stages as there is no support forautomating the generation and deployment of LDP from existing data, even ifit is already in RDF, and also, we do find any work in the state of the art whichattempts this.

12 https://www.w3.org/2012/ldp/hg/tests/reports/ldp.html on 19 July2017

13 http://callimachusproject.org on 15 July 201714 http://carbonldp.com on 15 July 201715 http://fedora-commons.org on 15 July 201716 http://marmotta.apache.org on 15 July 201717 http://www.openlinksw.com/ on 15 July 201718 https://github.com/linkeddata/gold on 15 July 201719 https://github.com/read-write-web/rww-play on 15 July 201720 https://github.com/spadgett/LDPjs on 15 July 201721 http://wiki.eclipse.org/Lyo/LDPImpl on 15 July 2017

Page 29: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

Title Suppressed Due to Excessive Length 29

8 Conclusion and Future Work

Linked Data Platforms can potentially ease the work data consumers, but thereis not much support from implementations to automate the generation and de-ployment of LDPs. Considering this, we proposed a language for describing partof the design of an LDP. Such a description can be processed to generate anddeploy LDPs from existing data sources regardless of the underlying implemen-tation of the LDP server. We demonstrated the flexibility, effectiveness, andreusability of the approach to cope with heterogeneity and dynamicity of datasources.

For now, LDP-DL is restricted only to some design aspects. Yet, differentcases can be dealt with it and also its use enables us to fulfill the requirementsfrom our context. We intend to consider a number of aspects in our future work.Firstly, we want to provide support for LDP Non-RDF sources and other typesof LDP containers. Moreover, we want to generate LDPs that supports LDPpaging [16]. Then, we want to extend the expressivity of LDP-DL by consideringother design aspects such as deployment design, security design, transactionmodel, etc. Our long term objective is to have a complete design language.Finally, from a theoretical perspective, we want to analyze formal properties ofthe language, such as design compatibility, design containment, design merge,parallelizability, and so on, based on the formal semantics.

Acknowledgments This work is supported by grant ANR-14-CE24-0029 fromAgence Nationale de la Recherche for project OpenSensingCity.

Page 30: LDP-DL: A language to de ne the design of Linked Data ...opensensingcity.emse.fr/ldpdl/technical_report.pdf · mentation on which the LDP is being deployed. Secondly, hardcoding the

30 Authors Suppressed Due to Excessive Length

References

1. W. Akhtar, J. Kopecky, T. Krennwallner, and A. Polleres. XSPARQL: Travelingbetween the XML and RDF worlds–and avoiding the XSLT pilgrimage. In ESWC,2008.

2. N. Bakerally. A system to automatize the deployment of data in linked dataplatforms. 2017.

3. N. Bakerally. Towards automatic deployment of linked data platforms. 2017.4. N. Bakerally. LDP-DL RDF Syntax and Semantics. Technical report, Mines Saint-

Etienne, 2018.5. G. Carothers and A. Seaborne. RDF 1.1 TriG, RDF Dataset Language, W3C

Recommendation 25 February 2014. Technical report, W3C, 2014.6. R. Cyganiak, D. Wood, and M. Lanthaler. RDF 1.1 Concepts and Abstract Syntax,

W3C Recommendation 25 February 2014. Technical report, W3C, 2014.7. A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh, E. Mannens, and R. Van de

Walle. RML: A generic language for integrated RDF mappings of heterogeneousdata. In LDOW, 2014.

8. M. Esteban-Gutierrez, N. Mihindukulasooriya, and R. Garcıa-Castro. LDP4j: Aframework for the development of interoperable read-write Linked Data applica-tions. In ISWC Developers Workshop, 2014.

9. R. B. France and B. Rumpe. Model-driven development of complex software: Aresearch roadmap. In FOSE, 2007.

10. S. Harris and A. Seaborne. SPARQL 1.1 Query Language, W3C Recommendation21 March 2013. Technical report, W3C, 2013.

11. M. Lefrancois, A. Zimmermann, and N. Bakerally. A SPARQL extension for gen-erating RDF from heterogeneous formats. In ESWC, 2017.

12. F. Maali and J. Erickson. Data Catalog Vocabulary (DCAT), W3C Recommen-dation 16 January 2014. Technical report, W3C, 2014.

13. Jorge Perez, Marcelo Arenas, and Claudio Gutierrez. Semantics and complexityof sparql. ACM Transactions on Database Systems (TODS), 34(3):16, 2009.

14. S. Speicher, J. Arwe, and A. Malhotra. Linked Data Platform 1.0. Technical report,W3C, February 26 2015.

15. S. Speicher, J. Arwe, and A. Malhotra. Linked Data Platform 1.0, W3C Recom-mendation 26 February 2015. Technical report, W3C, 2015.

16. S. Speicher, J. Arwe, and A. Malhotra. Linked Data Platform Paging 1.0 W3CWorking Group Note 30 June 2015. Technical report, W3C, 2015.

17. T. Stahl, M. Volter, J. Bettin, A. Haase, and S. Helsen. Model-driven softwaredevelopment: technology, engineering, management. Pitman, 2006.


Recommended