+ All Categories
Home > Documents > Using provenance to debug changing ontologies

Using provenance to debug changing ontologies

Date post: 30-Oct-2016
Category:
Upload: simon-schenk
View: 212 times
Download: 0 times
Share this document with a friend
15
Using provenance to debug changing ontologies q Simon Schenk , Renata Dividino, Steffen Staab WeST – Institute for Web Science and Technologies, University of Koblenz-Landau, Germany article info Article history: Available online 12 July 2011 Keywords: Semantic Web Provenance Reasoning Description logics OWL abstract On the Semantic Web ontologies evolve and are managed in a distributed setting, e.g. in biomedical dat- abases. Changes are contributed by multiple persons or organizations at various points in time. Often, changes differ by certainty or trustworthiness. When judging changes of automatically inferred knowl- edge and when debugging such evolving ontologies, the provenance of axioms (e.g. agent, trust degree and modification time) needs to be taken into account. Providing and reasoning with rich provenance data for expressive ontology languages, however, is a non-trivial task. In this paper we propose a formalization of provenance, which allows for the computation of prove- nance for inferences and inconsistencies. It allows us to answer questions such as ‘‘When has this incon- sistency been introduced and who is responsible for this change?’’ as well as ‘‘Can I trust this inference?’’. We propose a black box algorithm for reasoning with provenance, which is based on general pinpoint- ing, and an optimization, which enables the use of provenance for debugging in real time even for very large and expressive ontologies, such as used in biomedical portals. Ó 2011 Elsevier B.V. All rights reserved. 1. Introduction Ontologies often evolve in open, multi user ontology editing environments. Examples of open, evolving ontologies are the Open Biomedical Ontologies repository of the US National Cancer Insti- tutes Center for Bioinformatics [5] or the Ontology for Biomedical Investigations (http://obi-ontology.org), which is an inte- grated ontology for the description of life-science and clinical investigations. Many similar projects exist in varying domains. They are characterized by an open community with contributions at various points in time and from various sources, which might not be equally reliable. In such open settings where conflicting changes can occur it is desirable to track two undesired situations: (i) undesired inferences and (ii) inconsistencies In order to judge the reliability of inferences and to find errors in the ontology when debugging ontologies, it is hence necessary to ask questions such as ‘‘When has this inconsistency been introduced and who is responsible for this change?’’ as well as ‘‘Can I trust this inference?’’. Provenance information is used to answer these questions. Provenance is meta level information and can be tracked in many dimensions: knowledge source, least recently modified dates, de- grees of trustworthiness or the experience level of the editor. In all these cases, we have provenance labels, which are attached to axioms (e.g. timestamps) and orders over these labels, such as the ascending or descending order of timestamps or a partial order of trust among sources. This provenance information must be com- bined and propagated to a conclusion in order to answer the above mentioned questions. Standard algorithms for debugging ontologies poorly support the user in answering meta level questions and require expensive reasoning. For these reasons they are not applicable for expressive and large scale real world ontologies. With the approach presented in this paper we will show how to represent provenance and efficiently reason in OWL with prove- nance. Our approach helps the user cope with the complexity and dynamics of evolving ontologies. 1.1. Debugging with provenance Various approaches to the problem of debugging with prove- nance have been proposed. They can be grouped into three catego- ries: (a) Extensions of given logical formalisms that deal with a particular type of provenance. Examples include extensions for debugging with uncertainty, such as fuzzy and probabilistic [24] 1570-8268/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.websem.2011.06.009 q We would like to thank the US National Cancer Institute’s Center for Bioinformatics for sharing the real world data used for the evaluation. This research was supported by the Federal Ministry of Education and Research of Germany under Contract 01 IS 09 037 A-E, CollabCloud. The expressed content is the view of the authors but not necessarily the view of the CollabCloud project. Corresponding author. E-mail addresses: [email protected] (S. Schenk), [email protected] (R. Dividino), [email protected] (S. Staab). Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298 Contents lists available at ScienceDirect Web Semantics: Science, Services and Agents on the World Wide Web journal homepage: http://www.elsevier.com/locate/websem
Transcript
Page 1: Using provenance to debug changing ontologies

Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298

Contents lists available at ScienceDirect

Web Semantics: Science, Services and Agentson the World Wide Web

journal homepage: ht tp : / /www.elsevier .com/ locate/websem

Using provenance to debug changing ontologies q

Simon Schenk ⇑, Renata Dividino, Steffen StaabWeST – Institute for Web Science and Technologies, University of Koblenz-Landau, Germany

a r t i c l e i n f o

Article history:Available online 12 July 2011

Keywords:Semantic WebProvenanceReasoningDescription logicsOWL

1570-8268/$ - see front matter � 2011 Elsevier B.V. Adoi:10.1016/j.websem.2011.06.009

q We would like to thank the US National CBioinformatics for sharing the real world data used forwas supported by the Federal Ministry of Educatiounder Contract 01 IS 09 037 A-E, CollabCloud. The expthe authors but not necessarily the view of the Collab⇑ Corresponding author.

E-mail addresses: [email protected] (S. Sche(R. Dividino), [email protected] (S. Staab).

a b s t r a c t

On the Semantic Web ontologies evolve and are managed in a distributed setting, e.g. in biomedical dat-abases. Changes are contributed by multiple persons or organizations at various points in time. Often,changes differ by certainty or trustworthiness. When judging changes of automatically inferred knowl-edge and when debugging such evolving ontologies, the provenance of axioms (e.g. agent, trust degreeand modification time) needs to be taken into account. Providing and reasoning with rich provenancedata for expressive ontology languages, however, is a non-trivial task.

In this paper we propose a formalization of provenance, which allows for the computation of prove-nance for inferences and inconsistencies. It allows us to answer questions such as ‘‘When has this incon-sistency been introduced and who is responsible for this change?’’ as well as ‘‘Can I trust this inference?’’.

We propose a black box algorithm for reasoning with provenance, which is based on general pinpoint-ing, and an optimization, which enables the use of provenance for debugging in real time even for verylarge and expressive ontologies, such as used in biomedical portals.

� 2011 Elsevier B.V. All rights reserved.

1. Introduction

Ontologies often evolve in open, multi user ontology editingenvironments. Examples of open, evolving ontologies are the OpenBiomedical Ontologies repository of the US National Cancer Insti-tutes Center for Bioinformatics [5] or the Ontology for BiomedicalInvestigations (http://obi-ontology.org), which is an inte-grated ontology for the description of life-science and clinicalinvestigations. Many similar projects exist in varying domains.They are characterized by an open community with contributionsat various points in time and from various sources, which mightnot be equally reliable.

In such open settings where conflicting changes can occur it isdesirable to track two undesired situations:

(i) undesired inferences and(ii) inconsistencies

In order to judge the reliability of inferences and to find errors in theontology when debugging ontologies, it is hence necessary to ask

ll rights reserved.

ancer Institute’s Center forthe evaluation. This research

n and Research of Germanyressed content is the view ofCloud project.

nk), [email protected]

questions such as ‘‘When has this inconsistency been introducedand who is responsible for this change?’’ as well as ‘‘Can I trust thisinference?’’. Provenance information is used to answer thesequestions.

Provenance is meta level information and can be tracked in manydimensions: knowledge source, least recently modified dates, de-grees of trustworthiness or the experience level of the editor. Inall these cases, we have provenance labels, which are attached toaxioms (e.g. timestamps) and orders over these labels, such asthe ascending or descending order of timestamps or a partial orderof trust among sources. This provenance information must be com-bined and propagated to a conclusion in order to answer the abovementioned questions.

Standard algorithms for debugging ontologies poorly supportthe user in answering meta level questions and require expensivereasoning. For these reasons they are not applicable for expressiveand large scale real world ontologies.

With the approach presented in this paper we will show how torepresent provenance and efficiently reason in OWL with prove-nance. Our approach helps the user cope with the complexityand dynamics of evolving ontologies.

1.1. Debugging with provenance

Various approaches to the problem of debugging with prove-nance have been proposed. They can be grouped into three catego-ries: (a) Extensions of given logical formalisms that deal with aparticular type of provenance. Examples include extensions fordebugging with uncertainty, such as fuzzy and probabilistic [24]

Page 2: Using provenance to debug changing ontologies

Table 1Ontology axioms describing our scenario and their provenance.

ID Axiom Source Date

#1 RealCity v City u $hasCompany.Broadcaster

DPA 2009-09-10

#2 Broadcaster v $ hqIn.City DPA 2009-09-09#3 inverseProperty

(hasCompany, hqIn)DPA 2009-09-10

#4 bluewater: City Neverest 2009-09-09#5 vpktv: Broadcaster NeverestWikiP 2009-09-10#6 hqIn(vpktv, bluewater) NeverestWikiP 2009-09-09

1http://freebase.org.

S. Schenk et al. / Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298 285

or possibilistic [28] description logics. Recent proposals generalizesuch work to consider partial trust orderings [30]. (b) Flexibleextensions for systems allowing for algebraic query evaluation(e.g. as relational databases and SPARQL engines), such as[10,9,23], allow for many kinds of provenance, but are limited tolower expressiveness of the underlying logical formalism. (c) Tranet al. [33] provides a two step evaluation for provenance, which isvery expressive, but which does not assign a uniform semantics tothe definition and composition of provenance in the knowledgebase.

Expressive descriptions of provenance combined with lessexpressive base languages (such as SPARQL and SQL) make use ofthe fact that the base languages can be evaluated bottom up usingrelatively simple algebraic expressions. However, debuggingframeworks frequently have non-tree-based derivations used forconsistency checking and querying. In order to be able to debugwith algebraic provenance on top of such expressive base lan-guages, we propose a debugging framework for provenance basedon pinpointing. Pinpointing summarizes explanations for axioms ina single boolean formula, which then can be evaluated using aprovenance algebra. Consequently, provenance for query answerscan be computed by computing the explanations of the answerand using the pinpointing formula to compute provenance in analgebraic way.

Unfortunately, the computation of pinpoints may become veryexpensive and inapplicable if users need to interact with dynami-cally changing knowledge in real time. Therefore, we provide anoptimized black-box algorithm, which does not need to computeall pinpoints. Our evaluation shows that our algorithm performssignificantly better than the naïve algorithm, based on both real-world and synthetic datasets. For the purpose of this paper andfor the prototypical implementation we restrict ourselves to a frag-ment of description logics and of OWL-2, called SRIQðDÞ, and useOWL-2 as a base language and consistency checks as debuggingtask. Other debugging tasks such as entailment checks and queryanswering can be reduced to consistency checks in OWL-2 [19].

1.2. Contributions and structure

In this paper we motivate the need for tracking provenancewhen answering queries upon changing ontologies on the Seman-tic Web. We propose automatic mechanisms to verify the qualityof a conclusion based on its available provenance and to computeprovenance for inconsistencies. This paper makes the followingcontributions:

– a formalization of provenance syntax and semantics,– the computation of provenance for inferences and inconsisten-

cies based on pinpointing,– an optimized algorithm for debugging with provenance, and– an evaluation of the approach using real world data from evolv-

ing ontologies.

For the sake of example, we present our framework restricted toontology diagnosis scenarios. Our work, however, introduces an ap-proach for provenance querying under a variety of scenarios such asto restrictions of access rights [6], knowledge validity when thetruth of knowledge changes with time [27], and inferring trustvalue[16].

The remainder of this paper is structured as follows. In Section2 we formalize a use-case scenario to motivate the use of prove-nance for tracking ontologies changes. Section 3 introduces thefoundations for provenance computation: first we briefly intro-duce an extension of the description logic DL, called SRIQðDÞ,underlying OWL lite and OWL DL, followed by the formalizationof pinpointing and existing algorithms for pinpoint computations,

and the formalization of provenance orders. Section 4 introducesthe syntax used for expressing provenance in OWL-2. In Section 5we define the semantics of provenance, including composition ofprovenance dimensions (to model complex provenance) andmerging of conflicting provenance. In Section 6 we define thecomputation of provenance for a query answer based on pin-pointing and propose an optimized algorithm for debugging withprovenance. We discuss the complexity in Section 7. Section 8presents the evaluation results. Our evaluation shows that thisalgorithm performs orders of magnitude better than a naïveimplementation. A review of related work and the comparisonwith our approach is presented in Section 9. Section 10 concludesthe paper.

2. Motivating scenario

On 10th September 2009, a person called the German PressAgency (DPA) and notified them that a terrorist attack had justtaken place in Bluewater, CA. In order to check the quality of thisinformation, DPA did a short Internet research and found a web-site for Bluewater’s local TV station (vpktv) and for the town it-self. Additionally, they found Wikipedia entries for both thetown and the local TV station. DPA announced the attack asbreaking news.

Unfortunately for DPA, the information a was fake. No terroristattack had happened in Bluewater and the town Bluewater, CAdoes not even exist. The breaking news had been spread and thefake background information had been set up by the marketingagency Neverest to support guerrilla marketing for the new movie‘‘Shortcut to Hollywood’’.

DPA did not take into account the dynamics and the provenanceof the information retrieved when checking the plausibility of thenews. The new Wikipedia article has been created by the sameauthor and modified multiple times shortly before the call. More-over, all related websites as well as the Wikipedia articles had beenset up by the same marketing agency at roughly the same time.

Table 1 shows the instantiation of our scenario. We supposethat the webpages of Bluewater and vpktv are described by axiomsof ontologies from a single source, data from Wikipedia corre-sponds to an open, Wiki-like ontology editing system such as Free-base,1 and the provenance consists of modification dates andrespective degrees of trustworthiness based on the informationsource.

The axioms of Table 1 describe that a RealCity is a City with atleast one Broadcaster (#1) and a Broadcaster has its headquartersin a City (#2). Additionally it defines that for any Broadcaster thathas its headquarters in a City then this City hasCompany which isthe Broadcaster (#3), that bluewater is a City (#4), vpktv is a Broad-caster (#5), and vpktv has its headquarters in the city bluewater(#6). Furthermore the columns provenance and asserted of Table

Page 3: Using provenance to debug changing ontologies

286 S. Schenk et al. / Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298

1 show provenance associated with each axiom. For instance, DPA,the German Press Agency, created axiom #1, Nev, the marketingagency Neverest, created axiom #4 and NevWP, their Wikipediauser, created axiom #5. The rightmost column indicates, when axi-oms have been asserted, for instance, the axiom #1 has been lastasserted on 10th September 2009.

DPA needs to verify that Bluewater indeed is a city in CA. Hence,they need to answer the query ‘‘bluewater: RealCity?’’ and to obtainprovenance for the answer. In the rest of this paper we will explainhow this is done.

3. Foundations

Even though our approach is generic enough to be applicablebeyond OWL, we focus on an extension of the description logicDL, called SRIQðDÞ, underlying OWL lite and OWL DL. Hence, webriefly revisit the definition of SRIQðDÞ.

As we use pinpointing as a vehicle for computing provenance,we then introduce pinpointing as a foundation for our algorithmfor computing provenance and give an overview of algorithmsfor finding pinpoints.

For the scenario used in this paper we need two particular pro-venance dimensions: the modification date and degrees of trust-worthiness which is derived from the knowledge source. In ourscenario as well as in most other real world applications, degreesof trustworthiness cannot result in a strict numeric trust order.Hence, we conclude the foundations by introducing generic, partialprovenance orders as a generalization of various existing trustmodels.

3.1. SRIQðDÞ

In the following, we briefly introduce a fragment of DL, calledSRIQðDÞ, and the fundamental reasoning problems related toDLs. For details we refer the reader to [19,17].

Definition 3.1 (Vocabulary). A vocabulary V ¼ ðNC ;NR;NIÞ is atriple where

– NC is a set URIs used to denote classes,– NR is a set URIs used to denote roles and– NI is a set URIs used to denote individuals.

NC ;NR;NI need not be disjoint.

An interpretation grounds the vocabulary in objects from an objectdomain.

Definition 3.2 (Interpretation). Given a vocabulary V, an interpre-tation I ¼ ðDI ; �IC ; �IR ; �I I Þ is a quadruple where

– DI is a nonempty set called the object domain;– �IC is the class interpretation function, which assigns to each

class a subset of the object domain �IC : NC ! 2DI ;– �IR is the role interpretation function, which assigns to each role

a set of tuples over the object domain �IP : NP ! 2ðDI�DI Þ;

– �I i is the individual interpretation function, which assigns toeach individual a 2 NI an element aI I from DI .

Let C;D 2 NC , let R;Ri; S 2 NR and a; ai; b 2 NI . We extend the roleinterpretation function �IR to role expressions:

ðR�ÞIR ¼ fðhx; yiÞjðhy; xiÞ 2 RIRg

We extend the class interpretation function �IC to class descriptions:

>IC � DI

?IC �£

ðC u DÞIC � CIC \ DIC

ðC t DÞIC � CIC [ DIC

ð:CÞIC � DI n CIC

ð8R:CÞIC � fx 2 DI jhx; yi 2 RIR ! y 2 CIC gð9R:CÞIC � fx 2 DI j9y 2 DI : hx; yi 2 RIRgð9R:Self ÞIC � fx 2 DI jha; ai 2 RI gðP nRÞIC � fx 2 DI j9y1; . . . ; ym 2 DI :

hx; y1i; . . . ; hx; ymi 2 RIR ^m P ngð6 nRÞIC � fx 2 DI j 9= y1; . . . ; ym 2 DI :

hx; y1i; . . . ; hx; ymi 2 RIR ^m > ng

Class expressions are used in axioms.

Definition 3.3 (Axiom). An axiom is one of

– A general concept inclusion of the form C v D for concepts Cand D;

– An individual assertion of one of the forms a: C, (a,b): R,ða; bÞ : :R, a = b or a – b for individuals a, b and a role R.

– A role assertion of one of the forms R v S, R1 � . . . � Rn v S,Asy(R), Ref(R), Irr(R), Dis(R,S) for roles R;Ri; S.

We now define satisfaction of axioms.

Definition 3.4 (Satisfaction of axioms). Satisfaction of axioms in aninterpretation I is defined as follows. With � we denote thecomposition of binary relations.

ðR v SÞI � hx; yi 2 RI ! hx; yi 2 SI

ðR1 � � � � � Rn v SÞI � 8hx; y1i 2 RI1 ; hy1; y2i 2 RI2 ; . . . ;

hyn�1; zi 2 RIn : hx; zi 2 SI

ðAsyðRÞÞI � 9hx; yi 2 RI : hy; xi R RI

ðRef ðRÞÞI � 8x 2 DI : hx; xi 2 RI

ðDisðR; SÞÞI � RI \ SI ¼£

ðIrrðRÞÞI � 8x 2 DI : hx; xi R RI

ðC v DÞI � x 2 CI ! x 2 DI

ða : CÞI � aI 2 CI

ðða; bÞ : RÞI � haI ; bI i 2 RI

ðða; bÞ : :RÞI � haI ; bI i R RI

a ¼ b � aI ¼ bI

a – b � aI – bI :

An ontology is comprised of a set of axioms.

Definition 3.5 (Ontology). A SRIQðDÞ ontology O is a set ofaxioms as defined in definition 3.3.

In this paper we make use of two basic reasoning mechanisms forSRIQðDÞ, i.e. consistency checking and axiom entailment.

Definition 3.6 (Reasoning with SRIQðDÞ). We say an interpreta-tion I of an ontology O is a model of O ðI � OÞ, if all axioms in O aresatisfied in I.

We say an ontology O is consistent if there exists a model of O.We say an axiom A is entailed by an ontology O ðO � AÞ, if A is

satisfiable in all models of O.

Page 4: Using provenance to debug changing ontologies

S. Schenk et al. / Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298 287

Obviously, a subontology again is an ontology. Moreover, ifO0 # O and O0 � A, then also O � A. We will make use of this factin the optimized algorithm proposed in Section 6.2.

3.2. Pinpointing

The term pinpointing has been coined for the process of findingexplanations for concluded axioms or for a discovered inconsis-tency. A pinpoint is a minimal subset of an ontology, which makesthe concluded axiom true (or the theory inconsistent, respectively).Such an explanation is called a pinpoint. While there may be multi-ple ways to establish the truth or falsity of an axiom, a pinpoint de-scribes exactly one such way.

Definition 3.7 (Pinpoint). A pinpoint P for an axiom A wrt. anontology O is a set of axioms, such that P # O, P � A, and8B 2 P : P n fBg2A.

Analogously, we can define a justification for a refuted axiom(O � :A) as P # O, P � :A, and 8B 2 P : P n fBg2:A. Hence, findingpinpoints for a refuted axiom corresponds to finding the MinimumUnsatisfiable Subontologies (MUPS) for this axiom [1]. In this paperwe will focus on entailed axioms ðO � AÞ. However, all definitionsand algorithms can be modified to justifications as well.

Pinpointing is the computation of all pinpoints for a given ax-iom A and ontology O. The pinpointing formula [7] describes, whichaxioms need to be present to for O to entail A.

Definition 3.8 (Pinpointing formula). Let A be an axiom, O anontology and P1; . . . ; Pn with Pi ¼ Ai;1; . . . ;Ai;mi

g the pinpointsof A wrt. O. Let id be a function assigning a unique identifier to anaxiom. Then

Wni¼1Vmi

j¼1idðAi;jÞ is a pinpointing formula of A wrt. O.

Algorithms for finding pinpoints can be grouped into threegroups:

Finding one Pinpoint: Algorithms for finding one pinpoint caneither derive a pinpoint by tracking the reasoning process of a ta-bleaux reasoner, or use an existing reasoner as a black box. In the lat-ter case, a pinpoint is searched for by subsequently growing(shrinking) a subontology, until it starts (stops) entailing the axiomunder a pinpoint is searched for. Based on the derived smaller ontol-ogy the process is refined, until a pinpoint has been found. The ad-vantage of blackbox algorithms is that they can support any DL, forwhich a reasoner is available [1].

Finding all Pinpoints using a Tableaux Reasoner: Baader andPeñaloza have shown how tableaux reasoners for DLs such as OWLcan be extended to find pinpointing formulas [7]. In their approacha tableaux reasoner is extended to find not only one, but all pin-points. Special care needs to be taken in order to ensure termina-tion of the tableaux algorithm. As an advantage, the overhead forpinpointing is lower compared to a blackbox algorithm. Moreover,this approach can derive a compact representation of the pinpoint-ing formula, which however might still have worst-case exponen-tial size in normal form.

Finding all Pinpoints using Blackbox Algorithms: The mostperformant black-box algorithms extract a relevant module fromthe overall ontology, ensuring that this module yields the same in-ferences with respect to the axiom in interest. Then, starting froma single pinpoint, Reiter’s Hitting Set Tree algorithm [29] is used tocompute all pinpoints by iteratively removing one axiom from thepinpoint at hand and growing it to a full pinpoint again [21,20].

3.3. Provenance order

Provenance may come in various, complex dimensions such asknowledge source, editors, modification data and degrees oftrustworthiness. Provenance dimensions must be exploited and

combined to arrive at an accurate assessment of information value[15]. Provenance dimensions are defined in detail in Section 5.1.

First we take a closer look at the two specific dimensions of pro-venance used in examples in this paper: time and source. Prove-nance is expressed using provenance labels. An example for aprovenance label is a timestamp or a source name.

The label alone is not sufficient for the tracking of provenancewhen debugging an ontology. We need a provenance order on theprovenance labels. For example, suppose that an user has intro-duced an inconsistency during his last change. We might be in-terested, when the oldest axiom leading to an inconsistency hasbeen added, or when the youngest has been added. The oldestaxiom tells us, when this particular topic has first been ad-dressed and the newest one tells us, when the inconsistencyhas been introduced. We use an ascending or descending orderof timestamps.

Not all types of provenance labels have a natural order. Thereare, for example, many ways how degrees of trustworthiness canbe computed. Often simplifications are used, such as assumingtrust to be measured on a scale from 1 to 10. Such simplifica-tions usually are not easily justifiable [15] when trust shouldbe established using provenance labels such as the knowledgesource.

In particular, trust (and provenance in general) cannot alwaysbe measured on a total order, but there may be agents which areincomparable. Please note that even though we use trust in ourrunning example, the same applies e.g. when modeling accessrights, roles in an editing workflow or comparing world views ofusers participating in an ontology editing platform. A good analogyare access right systems, which usually introduce some kind of or-dering of users and groups. This ordering always is artificial andusually also partial (not in every pair of users or groups one is morepowerful than the other) [6].

We provide a generic formalization of provenance orders here,which subsumes others such as [13] and enables us to use any kindof order over provenance labels in the following. The following for-malization has been used in similar form in [30].

Definition 3.9 (Provenance order). A provenance order T is alattice over a finite set of provenance labels.1 is the label used forthe maximal element of the lattice.

If two provenance labels a and b are not comparable, we introducevirtual provenance label inf ab and supab, such that:

– inf ab < a < supab and inf ab < b < supab;– 8c<a;c<b : c < inf ab and– 8d>a;d>b : d > supab

To understand the importance of the last two steps, assume thatc > a > d and c > b > d and a, b are incomparable. Then the prove-nance label of a ^ b would be the provenance label of c, as c is thesupremum in the provenance order. Obviously this escaping to ahigher provenance label is not desirable. In our trust example, itwould mean escaping to a higher trust level. Considering roles isa workflow, we might end up with the wrong role. Instead, the vir-tual provenance labels represent that we need to pick at least one, aor b. For convenience, we will write {a,b} for inf a;b in the following.

Note that in practice, these values need not be precomputed, soexponential blowup is not an issue. If they are needed they can beencoded as lists of atomic labels.

Provenance orders subsume strict orders (such as [0. . .1],xsd:datetime and numeric trust degrees computed for Wikipedia[2]). It also allow for incomparable provenance labels, which arecommon on the Web due to the sheer size and usually incompleteknowledge.

Page 5: Using provenance to debug changing ontologies

Table 2Correspondences between source and degrees oftrustworthiness.

Source Trust

DPA 1Neverest NevNeverestWP NevWP

sup Nev,NevWP

Nev NevWP

infNev,NevWP

Fig. 1. Provenance order of the knowledge sources.

Table 3Example of multiple provenance assignments to the same axiom.

288 S. Schenk et al. / Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298

In this paper we assign degrees of trustworthiness to axiomsbased on the user expertise by whom they have been modified.Hence, the knowledge source of a piece of information is used to es-tablish trust.

In our running example, we use the following correspondencesbetween source and degrees of trustworthiness presented in Table2.

Fig. 1 illustrates a trust order for the sources shown in Table 1from the perspective of DPA.1 is assigned to the most trustworthysource which is DPA and Nev, NevWP are incomparable.

4. Syntax of provenance

Provenance in OWL-2 can be expressed as annotations on ax-ioms. Annotations are of importance for the management of ontol-ogies as annotations may be used to support analysis duringcollaborative engineering. Basically, an axiom annotation assignsan annotation object to an axiom. For instance in our scenariothe axiom ‘‘RealCity v City u 9hasCompany:Broadcaster’’ is assignedthe information source annotation object DPA.

A provenance annotation consists of an annotation URI and aprovenance object specifying the value of the annotation. In ourcase, the provenance object is a constant-value representing whoasserted the axiom, when the axiom was last asserted, the degreeof trustworthiness of the axiom, or a combination thereof. We pro-vide a detailed grammar for provenance annotations in [31]. Thegrammar for provenance annotations as an extension of OWL-2 an-notations2 is as follows. For sake of clarity we use the prefix Prove-nance for extensions to the OWL-2 grammar, which uses prefix OWL.

OWLAnnotation :¼ ProvenanceAxiomAnnotationProvenanceAxiomAnnotation :¼ ‘ProvenanceAxiomAnnota-tion’’(‘IRI ProvenanceAnnotation+’)’ProvenanceAnnotation :¼ ProvenanceCertaintyAnnotationjProvenanceDateAnnotationjProvenanceSourceAnnotationProvenanceCertaintyAnnotation :¼ ‘ProvenanceCertaintyAnnotation’’(‘Value’)’

2 OWL 2 Web Ontology Language: Spec. and Func.-Style Syntax: http://

www.w3.org/TR/2008/WD-owl2-syntax-20081202.

ProvenanceSourceAnnotation :¼ ‘ProvenanceSourceAnnotation’’(‘Value’)’ProvenanceDateAnnotation :¼ ‘ProvenanceDateAnnotation’’(‘Value’)’

An example of how provenance is represented and associatedwith OWL axioms is presented below. We consider the axioms#4 and #5 from our scenario:

OWLAxiomAnnotationðClassAssertionðbluewaterCityÞProvenanceAxiomAnnotationðannot1ProvenanceSourceAnnotationðNevÞÞÞ

OWLAxiomAnnotationðObjectPropertyAssertionðhqInvpktvbluewaterÞ

ProvenanceAxiomAnnotationðannot2ProvenanceDateAnnotationð09 :09 :2009ÞÞÞ

5. Semantics of provenance

Provenance assignments are syntactically expressed in OWL-2using axiom annotations. Annotations, however, have no semanticmeaning in OWL-2. All annotations are ignored by the reasoner,and they may not themselves be structured by further axioms.

Furthermore, using such an abstract syntax may remain re-markably ambiguous if it cannot be linked to a formal semantics.Assume that the following provenance axioms are part of ourscenario:

For the same axiom identified by #1 presented in Table 3, thequestion may arise whether this means a disjunction, i.e. one ofthe two sources has provided the fact, or a conjunction, i.e. bothsources have provided the fact, or a collective reading, i.e. thetwo sources together gave rise to the fact, or whether this situationconstitutes invalid provenance. In order to prevent such ambigu-ities we introduce a generic semantic framework for provenance.

5.1. Provenance dimension

Provenance can have atomic and complex dimensions such asknowledge source, modification date or a composition of thereof.We assume that these (and possible further) dimensions are inde-pendent of each other. In the next section, we generalize from thisassumption.

Definition 5.1 (Provenance dimension). A provenance dimension Dis an algebraic structure ðBD; __D; _̂ DÞ, such that ðBD; __DÞ and ðBD; _̂ DÞare complete semilattices. We denote the minimal element of D by?D.

BD represents the labels the provenance can take, e.g. all valid time-stamps for the modification date. As ðBD; __DÞ and ðBD; _̂ DÞ are com-plete semilattices, they are, in fact, also lattices. Hence, a finite setof provenance labels always has a join (supremum, least upperbound) and a meet (infimum, greatest lower bound) wrt. the corre-sponding order.

In contrast to provenance orders defined in Section 3.3 the joinand meet operators in a provenance dimension need not be dual asthey can come from two different lattices, which share the same

ID Axiom Trust

#1 RealCity v City u $ hasCompany.Broadcaster 1#1 RealCity v City u $ hasCompany.Broadcaster Nev

Page 6: Using provenance to debug changing ontologies

S. Schenk et al. / Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298 289

values but have different orders. An example of a provenance di-mension, where this may become important is where provenance[9]: only track, who contributed in any way to a certain inference.In this case, join and meet would coincide and would both be setunions of information sources. In all dimensions discussed in thispaper, join and meet are dual. In this case, provenance dimensionand provenance order directly correlate.

Example 5.1. To illustrate the meaning of _̂ and __, let I be theprovenance interpretation that is a partial function mappingaxioms into the label range of trust, and A and B be axioms of anontology such that A – B. When combining two provenance labelsfrom D, which are assigned to A and B, the intuitive meaning of __ is‘‘I need to trust one of A and B’’ (corresponding to a logical ‘‘or’’).The intuitive meaning of _̂ is ‘‘I need to trust both A and B’’(corresponding to a logical ‘‘and’’)). Hence, trust can be modeled as:

– ItrustðA __BÞ ¼ supðItrustðAÞ; ItrustðBÞÞ– ItrustðA _̂ BÞ ¼ inf ðItrustðAÞ; ItrustðBÞ

Likewise, the modification date as well as the creation date can bemodeled as:

– IdateðA __BÞ ¼ minðIdateðAÞ; IdateðBÞÞ– IdateðA _̂ BÞ ¼ maxðIdateðAÞ; IdateðBÞÞ

As we have seen in the Example 5.1, provenance is assigned to on-tology axioms. Within a single assignment, the provenance must beuniquely defined.

Definition 5.2 (Provenance assignment). A provenance assignmentM is a set fðD1; d1 2 D1Þ; . . . ; ðDn; dn 2 DnÞg of pairs of a provenancedimension and a corresponding provenance label, such thatDi ¼ Dj ) di ¼ dj. As default label for a provenance assignment indimension Dj we choose the minimal element ?Dj

. By ProvAssðAÞwe denote the set of provenance assignments for an axiom A.

Example 5.2. As an example, for the axiom ‘‘bluewater: City’’ of ourrunning example, we have the following provenance assignment:

ProvAss(bluewater: City) = {(trust,Nev), (date, 2009-09-09)}Without loss of generality we assume a fixed number of prove-

nance dimensions.Next, we formalize how provenance assignments are composed.

To obtain a logical formula which express how provenance assign-ments are composed, called provenance formula, we make use ofpinpointing formulas (discussed in Section 3.2) and of the how pro-venance strategy. How provenance [14] is a strategy, which de-scribes how an axiom A can be inferred from a set of axiomsfA1; . . . ;Ang, i.e. it is a boolean formula connecting the Ai. As pin-pointing summarizes explanations for axioms in a single booleanformula, and thus it provides how provenance, we use it to comeup with a provenance formula.

Example 5.3. Consider the query ‘‘for each city, find all companieslocated in that city’’:

x : City ^ hasCompanyðx; yÞ:

The result of this query and the corresponding pinpointing formulaare, based on the example data from Table 1:

x

y Pinpointing formula

Bluewater

vpktv #3 ^#4 ^#6

The associated provenance formula for this query result is:

Provenance formula

ProvAss(#3) _̂ ProvAss(#4) _̂ ProvAss(#6)

To evaluate the corresponding provenance formula, we need to de-fine the operators for provenance dimensions.

Definition 5.3 (Provenance operations). Let A, B be axioms. LetProvAssðAÞ ¼ fðD1; x1Þ; . . . ; ðDn; xnÞg and ProvAssðBÞ ¼ fðD1; y1Þ; . . . ;

ðDn; ynÞg be provenance assignments. Then the provenance opera-tions __ and _̂ are defined as follows:

ProvAssðAÞ __ProvAssðBÞ ¼ fðD; x __DyÞjðD; xÞ 2 ProvAssðAÞ and ðD; yÞ 2 ProvAssðBÞg:

ProvAssðAÞ _̂ ProvAssðBÞ ¼ fðD; x _̂ DyÞjðD; xÞ 2 ProvAssðAÞ and ðD; yÞ 2 ProvAssðBÞg:

Example 5.4. To illustrate how provenance is derived, for instance,the provenance assignment for the axiom ‘‘bluewater: City’’ is{(trust,Nev), (date, 2009-09-09)}, and the provenance assignmentfor the axiom ‘‘hqIn(vpktv,bluewater)’’ it is {(trust,NeverestWikiP),2009-09-09)}. We assume the provenance order described in Table2. The provenance for bluewater: City _ hqIn(vpktv,bluewater) isdetermined as follows:

ProvAssðbluewater : CityÞ __ProvAssðhqInðvpktv ; bluewaterÞÞ ¼fðtrust;Nev __NevWPÞ;ðdate;2009-09-09 __2009-09-09Þg:

Note that, due to the defaults introduced in Definition 5.2, theoperations on provenance assignments are defined even in the pre-sence of incomplete provenance from a domain. In our framework,just as __ and _̂ correspond to a logical ‘‘or’’ and ‘‘and’’, default cor-responds to a default truth value of ‘‘unknown’’ in a default logic.With a default assignment, we provide a uniform treatment of ax-ioms in case of absence of provenance, and thus we are able tocombine arbitrarily provenance assignments. In any case, as illu-strated in Example 5.1, the interpretation of provenance and de-faults assignments as well as the interpretation of theprovenance operations are domain application dependent. Suchconsiderations are orthogonal to our framework.

While axioms in the underlying description logic may containnegation, these negations are not visible and not needed at the le-vel of the provenance. Hence, the provenance algebra does notneed to contain a negation operator.

Finally, we define how to retrieve the provenance assigned to anaxiom A within a provenance dimension. The provenance of an ax-iom A within a provenance dimension is obtained by evaluating thecorresponding provenance formula in the dimension underconsideration.

Definition 5.4 (Provenance evaluation). Let provðAÞ be a functionmapping from an axiom A to a provenance assignment indimension D. The provenance of an axiom A wrt. O in D is obtainedby computing a pinpointing formula / of A wrt. O and obtaining wby replacing each axiom in / with its provenance assignment in Dand the logical operators __ and _̂ with their correspondingoperators in D. Then provðAÞ is computed by evaluating w.

Page 7: Using provenance to debug changing ontologies

sup Nev ,NevWP

Nev NevWP

infNev,NevWP

2009-09-10

2009-09-09

2009-09-10,sup Nev,NevWP

290 S. Schenk et al. / Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298

Example 5.5. In our running example, for the query ‘‘for each cityfind all companies’’, x:City ^ hasCompany(x,y), we have the resultbluewater, vpktv and the following provenance formula:

ProvAssð#3Þ _̂ ProvAssð#4Þ _̂ ProvAssð#6Þ

The corresponding provenance evaluation for the dimension trustis:

ðtrust;1Þ _̂ ðtrust;NevÞ _̂ ðtrust;NevWPÞ ¼ðtrust;1 _̂ Nev _̂ NevWPÞ ¼ðtrust; inf Nev;NevWPÞ:

2009-09-10,Nev 2009-09-10,NevWP

2009-09-10,inf Nev,NevWP

2009-09-09,sup Nev,NevWP

2009-09-09,Nev 2009-09-09,NevWP

2009-09-09,inf Nev,NevWP

Fig. 2. Provenance order of the knowledge sources.

5.2. Complex provenance dimensions

In the previous section we have described, how provenancecan be computed in a single dimension. This focus on a singledimension is useful if independence of dimensions can be as-sumed. Sometimes, however, this is not the case. For examplewhen a group of users collaboratively edits an ontology, thetime an axiom was asserted and the user responsible for themodification will often correlate. In this case, two knowledgedimensions can be composed into a complex provenancedimension:

Definition 5.5 (Composition of dimensions). Let D1 ¼ ðBD1 ; __D1 ; _̂ D1 Þand D2 ¼ ðBD2 ; __D2 ; _̂ D2 Þ be provenance dimensions. ThenD ¼ ðBD; __D; _̂ DÞ is a composed provenance dimension, such that

– BD ¼ BD1 � BD2 ,– ðBD; __DÞ ¼ fððx; yÞðv ;wÞÞjx;v 2 BD1 and y;w 2 BD2 and x6_D1

vand y6_D2

wg, and– ðBD; _̂ DÞ ¼ fððx; yÞ; ðv ;wÞÞjx; v 2 BD1

and y;w 2 BD2 and x6^D1v and y6^D2

wg

We will refer to the elements of BD as complex provenance labels andto the elements of BD1 and BD2 as atomic provenance labels.

Example 5.6. As an example of composition of dimensions, wecompose the dimensions trust and modification date and computethe provenance for x:City ^ hqIn(y,x). We have the provenanceformula:

ProvAssð#4Þ _̂ ProvAssð#6Þ

If we treat the trust and modification date dimensions separately,the results are

fðtrust;NevÞÞg _̂ fðtrust;NevWPÞg ¼fðtrust; inf Nev;NevWPÞg

fðdate;2009-09-09Þg _̂ fðdate;2009-09-10Þg ¼fðdate;2009-09-10Þg

If we combine them into one dimension as shown in Fig. 2, how-ever, the result is

fðtrust;NevÞ; ðdate;2009-09-09Þg _̂

fðtrust;NevWPÞ; ðdate;2009-09-10Þg ¼fðtrust;NevWPÞ; ðdate;2009-09-10Þg

Having composed the interdependent dimensions into one, onemay use the composed provenance dimension just as an atomicone. Definition 5.4 applies exactly as for simple provenancedimensions.

5.3. Semantics for conflicting provenance

In the following we extend our model to support conflictingprovenance, which can arise from conflicting changes or prove-nance assignments by multiple users at different times. A new op-erator is needed for this merging, as the merge operator needs notcoincide with one of __ and _̂ .

Definition 5.6 (Conflict tolerant dimension). A conflict tolerantprovenance dimension D is an algebraic structure ðBD; __D; _̂ D;�DÞ,such that ðBD; __DÞ, ðBD; _̂ DÞ and ðBD;�DÞ are complete semilattices.The minimum of ðBD;�DÞ is ?D.

Example 5.7. To illustrate how to support conflicting provenanceby multiple provenance assignments, let I be a provenance inter-pretation and let A be an axiom of an ontology, for which thereexist multiple provenance assignments. We use A0 and A00 to repre-sent the axiom A with different provenance assignments. Let uscompare the already known provenance dimension modificationdate with the creation date. For modification date, � needs to bethe maximum (we are interested in the last assertion), while forcreation date, � needs to be the minimum (we are interested inthe first assertion). In contrast to example 5.1 the operators forcreation date and modification date do not coincide. The modifica-tion date could be modeled as:

IdateðA0 � A00Þ ¼ maxðIdateðA0Þ; IdateðA00ÞÞ

The creation date could be modeled as:

IcreðA0 � A00Þ ¼ minðIcreðA0Þ; IcreðA00ÞÞ

Likewise, the trust could be modeled as:

ItrustðA0 � A00Þ ¼ supðItrustðA0Þ; ItrustðA00ÞÞTo show how the support to conflicting provenance by multiple

provenance assignments can be applied to our scenario, we slightlyextend our running example in Example 5.8:

Page 8: Using provenance to debug changing ontologies

S. Schenk et al. / Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298 291

Example 5.8. We assume that axiom (#1), RealCity v City u $hasCompany.Broadcaster, has been modified by two sources atdifferent times:

ID

Trust Date

#1

1 2009-09-10 #1 Nev 2009-09-09

Then the provenance assignment for axiom #1 is

fðtrust;1Þ; ðdate;2009-09-10Þg�fðtrust;NevÞ; ðdate;2009-09-09Þg ¼fðtrust;1� NevÞ;ðdate;2009-09-10� 2009-09-10Þg ¼fðtrust;1Þ; ðdate;2009-09-10Þg:

In contrast, if first Neverest and then DPA assert the same axiom, wewould have

fðtrust;1Þ; ðdate;2009-09-09Þg�fðtrust;NevÞ; ðdate;2009-09-10Þg

As1>trustNev , but 2009-09-09<date2009-09-10, these labels are in-comparable and cannot be further simplified.

In order to accommodate such potentially conflicting provenanceassignments about ontology axioms, we extend the semantics ofprovenance, which we have introduced in Section 5. For this pur-pose, we redefine the prov function of Definition 5.4, such that ituses � to merge provenance assignments in a preprocessing step.Afterwards, we have a unique provenance assignment again and ap-ply Definition 5.4 as before.

Definition 5.7 (Provenance extended). Let allprov : axioms! 2BD

be a function mapping from an axiom to all provenance assign-ments to that axiom in a provenance dimension D.

Let prov be a function mapping from an axiom to a provenanceassignment in dimension D. The provenance of an axiom A wrt. O inD is obtained by computing a pinpointing formula / of A wrt. O andobtaining w by replacing each axiom A in / with aðallprovðAÞÞ andthe logical operators __ and _̂ with their corresponding operators inD. Then prov(A) is computed by evaluating w.

Example 5.9. Consider the provenance assignment of axiom #1with conflicting provenance presented below and the provenanceformula:

ProvAssð#1Þ _̂ ProvAssð#2Þ

the derived provenance is:

fðtrust;1Þ; ðdate;2009-09-10Þg�fðtrust;NevÞ; ðdate;2009-09-09Þg _̂

fðtrust;1Þ; ðdate;2009-09-09Þg ¼fðtrust;1� NevÞ _̂ ðtrust;1Þ;

ðdate;2009-09-10� 2009-09-10Þ _̂ ðdate;2009-09-09Þg¼ fðtrust;1 _̂1Þ;

ðdate;2009-09-10 _̂ 2009-09-10Þg: ¼fðtrust;1Þ; ðdate;2009-09-10Þg:

This definition of prov not only allows to aggregate provenancefrom multiple sources, but also to gracefully handle unknown pro-venance, i.e. situations where a knowledge source does not providea label for some provenance dimension, in which case ?D is as-sumed as a default, as introduced in Definition 5.2

Example 5.10 shows how our approach can be applied to ouruse case scenario.

Example 5.10. Back to our scenario, DPA needed to verify thatBluewater indeed is a city in CA. Hence, they need to answer thequery ‘‘bluewater: RealCity?’’ and to obtain provenance for theanswer. The query results in the following provenance formula:

ProvAssð#3Þ _̂ ProvAssð#4Þ _̂ ProvAssð#6Þ

The resulting provenance label is:

fðtrust;1Þ; ðdate;2009-09-10Þg _̂

fðtrust;NevÞ; ðdate;2009-09-09Þg _̂

ðtrust;NevWPÞ; ðdate;2009-09-09Þg ¼fðtrust; inf Nev ;NevWPÞ; ðdate;2009-09-10Þg:

Note that in Example 5.10 the labels of the modification date di-mension are comparable, while the labels of the trust dimensionare not. Hence, the resulting provenance label is a tuple of the infi-mum of Nev and NevWP in the trust dimension and the maximum ofthe dates in the assertion date component.

6. Computing provenance using pinpoints

We now use our definitions of provenance in order to trackdown what recent addition to the knowledge base led to a desiredor undesired effect. In the case of our collaborative ontology edit-ing scenario, we may want to identify who was the least authorita-tive source that contributed most recently to an inconsistency.

The algorithms discussed in this section fulfill this purpose, i.e.given a query (an axiom), an ontology and a provenance dimen-sion, they extract a subontology as explanation and compute theprovenance label of the query.

As we can see above, Definition 5.4 relies on a pinpointing for-mula for the computation of provenance. Hence, we need to find allpinpoints to compute a pinpointing formula. Then we can immedi-ately derive the provenance label.

6.1. Naïve algorithm

A naïve evaluation of provenance for an axiom might computeall pinpoints and then evaluate the pinpointing formula. This strat-egy is illustrated in algorithm ProvNaive. ProvNaive takes asparameters an ontology O, an axiom A and a provenance dimensionD. It returns the provenance label of A wrt. O and a subontologycontaining all pinpoints for O � A.

Listing 1. ProvNaive(O,A,D).

1 Pinpoints :¼ GetAllPinpoints(A,O);2 Lab :¼ {lj$P 2 pinpoints: l = maxD(prov(P))};

3 Lab :¼ f _Wl2lablg;

4 M :¼S

p2pinpointsp;5 ProvNaive :¼ (M, lab);

However, finding all pinpoints is a very expensive operation and

this approach is not appropriate to be used in real use cases. Findinga pinpoint using a black-box approach may need exponentiallymany consistency checks in the underlying DL. Moreover, therecan be exponentially many pinpoints.

6.2. An optimized algorithm for debugging with provenance

We present an optimized algorithm for deriving provenance,which performs significantly better in the average case than naïve

Page 9: Using provenance to debug changing ontologies

(?) C D

(b ) C F(4) C E

(2) E G

(a) F D

(b) F G

(1) G D

Fig. 4. Justifications with incomparable provenance.

292 S. Schenk et al. / Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298

implementations. It does not need to compute all pinpoints, and infact not even precisely a pinpoint. Instead, we compute an approx-imation which is sufficient for deriving provenance.

The optimization is based on the assumption that the prove-nance dimension is a lattice ðBD; __D; _̂ DÞ such that a __Db ¼supða; bÞ and a _̂ Db ¼ infða; bÞ for a; b 2 BD (or vice versa). Note thatin the general case the interpretation of __D and _̂ D is independentas shown in Definition 5.1, i.e. they do not need to be dual. How-ever, for optimization issues we assume that __D and _̂ D are nolonger independent. This assumption is true for all provenance di-mensions discussed above such as modification date, degree oftrustworthiness and for all total orders. In this case, the pinpoint-ing formula has the structure of a supremum of infima when ex-pressed in disjunctive normal form.

Considering this assumption, we make the evaluation more ef-ficient since we can exploit monotonicity properties of our prove-nance dimensions where applicable.

For instance, for a provenance dimension with a total order,once we find the pinpoint with the highest provenance label, wehave also found the overall maximal provenance label. Thus, inmany cases, we do not need to compute all pinpoints and wemay restrict the computation of the pinpointing formula to thoseparts of an ontology that are relevant for the provenance computa-tion given its particular lattice structure.

If the provenance dimension is a partial order, several pinpointswith incomparable provenance labels may be found. Thus, we needto find all pinpoints with maximal (minimal) provenance labelsand merge the corresponding provenance labels to determine theresulting provenance label.

Without loss of generality, we only consider for our optimizedalgorithm the case that the corresponding pinpointing formulashave the structure of a supremum of infima. In this case the prove-nance label of a single pinpoint is the infimum of the labels of theaxioms it contains, and the overall provenance label is the supre-mum of the labels of all such pinpoints. As a result, we do not needto take into account any axiom, which has a provenance label lessthan the infimum of the labels of the greatest pinpoints. For thecase of a infimum of suprema, we simply need to invert all compar-isons and replace min by max in Listings 2 and 3.

Likewise, we only consider consistent ontologies. The proposedalgorithms handle inconsistent ontologies equally well, if the en-tailment check in line 10 of Listing 2 is negated.

The algorithm starts with a query of the form ‘‘O � A?’’, as wefocus on entailment checking here. In order to find those axiomsthat are relevant for the provenance computation given the query,the algorithm iteratively grows a subontology around A based onthe syntactic relevance selection function. Intuitively, an axiom issyntactically relevant for another axiom, if it contributes to the de-finition of one of the concepts or properties in the other axiom. InFig. 3, a small example ontology is shown. The arrows representsyntactic relevance relationships between axioms. Assume the

(a)

Fig. 3. Selection of axioms b

query is C v D?. Then the two axioms at the bottom can not be re-levant to the answer, because they are neither directly nor indir-ectly relevant to the query. The rest of the ontology containsthree justifications for C v D, namely fC v E; E v G;G v Dg;fC v F; F v G;G v Dg and fC v F; F v Dg

Definition 6.1 (Syntactic relevance [20]). An axiom B is directlysyntactically relevant for an axiom A, if their clusters overlap, i.e. ifthey share a concept, role or individual. B is syntactically relevantfor A, if it is directly syntactically relevant for A, or if B issyntactically relevant for C, which is syntactically relevant for A.

We define a convenience function r: Given A and an ontology O,rðA;OÞ ¼ fB 2 OjB is directly syntactically relevant for A}. Thedefinition carries over to sets of axioms M: rðM;OÞ ¼fB 2 Oj9A 2 M : B is directly syntactically relevant for A}.

Using the syntactic relevance selection function, the algorithmProv(O, A, D) computes the provenance label of O � A in dimen-sion D, if O � A. We start by determining the set of syntactically re-levant axioms for A = C v D in line 3 (C v E, C v F, F v D and G v D).Then we add the ones with the greatest provenance label to themodule in line 4 (C v E in step1; C v F in step 2). In the inner loopwe recursively add all syntactically relevant axioms for the newmodule, which do not decrease the provenance degree of the over-all solution (E v G in step 2). Note that this is just an optimizationstep to avoid unnecessary entailment checks. It can be omitted tocompute a more precise approximation of the pinpoint. The trade-off is a possibly higher number of iterations and entailment checks.In each iteration we add axioms until we have found a module,which contains a pinpoint in line 10 (this is the case after step 2of the example). The provenance label for A is the smallest prove-nance degree of the axioms in the module (hence, 3). We workwith a set for labels to account for the fact that in the presence ofpartial orders, minimum and maximum are not unique. Hence,the min function used in line 11 is the set version, which returnsthe set of smallest elements of a set. For total orders, label alwaysis a singleton.

Note that our optimization strategy is strongly related to two is-sues: (a) the syntactic relationships among the pinpointing axioms,

(b) (c)

y optimized algorithm.

Page 10: Using provenance to debug changing ontologies

S. Schenk et al. / Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298 293

and (b) the size of pinpoints. The closer the pinpointing axioms aresyntactically correlated and the smaller the pinpoints are, the fas-ter the module can be built and the provenance label be derived. Bygrowing only a small module of the ontology and avoiding to usethe instantiated ontology as a whole for approximating pinpoints,we save lots of entailment checks which are very expensive.

Listing 2. Prov(O,A,D).

1 M :¼ {A}2 repeat

syn :¼ r(M, O);3 add :¼ {B1 2 syn j 9=B2 24 syn:provðB1Þ<DprovðB2Þ}5 repeat6 M :¼M [ add7 syn :¼ r(M,O)8 add :¼ {B1 2 syn j9B2 2 M : provðB1ÞPDprovðB2Þ}9 until (add # M)10 until (M-A � A);11 lab :¼minD({prov(B)jB 2 M})12 Prov :¼ (M, lab);

Theorem 6.1. Let O be an ontology, A an axiom such that O � A and Da provenance dimension, which is a total order. Prov(O, A, D)

computes the provenance degree of O � A in dimension D.

O

P R

Fig. 5. Relations between the set of axioms of the ontology (O), the axioms in alpinpoints (P) and the axioms retrieved by the algorithm (R).

Proof. First we show that Prov terminates. The inner loop termi-nates when M-A � A. At each iteration add is assigned to a subset ofO, which is syntactically relevant to module and has the greatestprovenance degree. Consequently, module is a subset of O. As O isfinite and O � A, the loop must eventually terminate. In this caseM-A indeed � A and lab = minD ({prov(B)jB 2M}). h

It remains to show that if Prov terminates, it returns the correctprovenance degree of O � A. Remember that we are looking for thepinpoint with the highest provenance label and that the prove-nance label of the pinpoint is the infimum of the provenance labelsof its axioms. M only contains axioms which are syntactically rele-vant for A. When the outer loop terminates, M contains a (supersetof a) greatest pinpoint of A wrt. O. Assume there is a pinpoint P,which has not been found and has a greatest provenance label.Then it must already be part of module: All axioms in a pinpointof A must be syntactically relevant for A. Moreover,8B 2 P : provðBÞ>Dlab. M contains all such axioms, which are di-rectly syntactically relevant for A, or which are indirectly syntacti-cally relevant via axioms with provenance labels PD lab. Hence, Mmust already contain P. Refutation.

For partial orders the resulting label may contain multiple ele-ments which need to be merged. Fig. 4 illustrates a case wherewe have a complex dimension composed of numbers and lettersand their natural orders and multiple relevant pinpoints, which to-gether result in a provenance label of inf 1;a. ProvPartial com-putes accurate labels for partial orders. It uses Prov to computean approximation first. Although there might be multiple pinpointswith incomparable provenance labels, Prov stops after the firstpinpoint is contained in the approximation. Therefore we need tofurther extend the approximation, such that all possibly relevantaxioms are included. If the label returned by Prov is a singleton,there is a unique maximal pinpoint (lines 2 and 3). Otherwise,we compute the minimum of all elements of lab (line 4), whichis the lower bound D we need to consider. For example, as numbersand letters are incomparable, the lower bound for a and 2 is inf 1;a.This means we could ignore any axiom, which does not have a pro-venance label (and hence defaults to ?), but need to consider

G v D. Every pinpoint with a provenance label equal to or less thanmin is also less than the label of the pinpoint we have alreadyfound in Prov, and hence we can ignore it. We now extend our ap-proximation with all axioms, which are syntactically relevant to Aand, if they are indirectly relevant, are connected to A only throughaxioms, which have a provenance label greater than min. The loopin lines 5 to 9 works analogously to lines 5 to 9 in Prov. Finally, wecompute the correct label using ProvNaive.

Listing 3. ProvPartial(O,A,D).

1 (M, lab) :¼ Prov(O,A,D);2 if (jlabj = 1) then3 ProvPartial :¼ (M, lab);

4 min :¼ _Vl2lab l;

5 repeat6 N :¼M7 syn :¼ r(M,O);8 M :¼M[fB 2 syn j prov(B)>Dmin};9 until (M = N)10 ProvPartial :¼ ProvNaive(M,A,D)

Theorem 6.2. Let O be an ontology, A an axiom such that O � A and Da provenance dimension, which is a partial order. ProvPartial(O,A, D) computes the provenance degree of O � A in dimension D.

In line 9 we use the naïve algorithm. Therefore we only need toshow two things: (a) If Prov returns a singleton, it is indeed correctalso for partial orders and (b) lines 4 to 9 indeed compute a mod-ule, which contains all axioms relevant for computing the correctsolution using the naïve algorithm.

(a) In line 3 and 4 of Prov exactly those axioms are added to M,which are minimal in the current syntactic relevance setsyn. Hence, if incomparable provenance degrees occur, allof them are selected. They are also preserved in line 11, asthere is no unique minimum in this case. Thus, if a uniqueminimum is returned, then it must in fact be equal orgreater than the provenance labels of all other pinpoints.

(b) Assume there is an axiom B, which is contained in some pin-point P, which is relevant for lab and missing in M. We con-sider two cases:

(b1) _Vp2PprovðpÞ 6min. That means some provðpÞ is less or

equal min. Then P is not relevant. Refutation.(b2) Hence, prov(p) must be greater than min for all p 2 P.

Then all p and hence also B have been selected in lines5 to 9. Refutation.

Fig. 5 presents a diagram showing the relations between the setof axioms of the ontology (set O), the set of axioms of the relevantpinpoints (set P), and the set of axioms of the module retrieved by

l

Page 11: Using provenance to debug changing ontologies

Table 4Ontologies used in the experiments.

Ontology Expressivity Totalaxioms

No. Unsat.Classes

Av. of Pinpointsper Unsat. Class

1, 7 People ALCHOIN 372 1 12, 8 MiniTambis ALCN 400 30 13, 9 University SOIN 92 9 14, 10 Economy ALCHðSÞ 2330 51 15, 11 Chemical ALCHF 192 37 116, 12 Transport ALCH 2178 62 2

294 S. Schenk et al. / Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298

our algorithm (set R). Note that the algorithm retrieves some of thepinpoint axioms. Axioms that are part of the pinpointing formulaand are retrieved by the algorithm (the overlap of the sets P andR) correspond to those that are relevant and necessary for derivingprovenance.

7. Complexity

The worst case complexity of both the naïve and the optimizedapproach for computing provenance is equivalent to the computa-tion of all pinpoints in the underlying logic as the pinpointing for-mula can be evaluated in polynomial time. If it is expressed innormal form, however, the size of the formula can blow up expo-nentially. Approaches for computing pinpoints like [7] which,rather than representing pinpoints formula in a normal form, de-rive a compact representation of the pinpoints formula benefitthe computation of provenance since they avoid exponentialblowup.

While in the worst case the complexity of computing prove-nance is the same as the complexity of finding all pinpoints andhence quite high, in the average case we can do much better usingthe optimized algorithm as we show in Section 8.

8. Evaluation

The prototype of the optimized provenance computation algo-rithm (Prov) has been implemented in Java 1.6 using Pellet2.0.03 and OWL API trunk revision 1310.4 Prov is available as anopen source implementation at http://west.uni-koblenz.de/

Research/MetaKnowledge.We have performed our experiments using two groups of ontol-

ogies. The first experiment is composed of standard ontologies fordebugging ontologies that have already been used for testing thecomputing time of laconic justifications in [18]. These ontologieshave been used in [18] to demonstrate the efficiency of traditionalcomputation of pinpoints. The main goal of this experiment is toshow that our optimized algorithm speeds up the performance ofprovenance computations compared to the standard computationof pinpoints.

The second experiment uses real-world living ontologies fromBioportals containing real change logs with timestamps and edi-tors. The main goal of this experiment is to test the applicabilityand scalability of our approach for computing provenance in realtime. For both experiments we use a virtual machine with 2 GBRAM and a single Intel(R) Xeon(TM) CPU core with 3.60 GHz.

8.1. Experiment I – debugging ontologies

In this experiment, we compare our optimized algorithm to thenaïve approach (baseline) based on full axiom pinpointing. For thisexperiment, we have used the ontologies from Table 4. This datasethas been already used in [18] to demonstrate the efficiency of tra-ditional computation of pinpoints, and it offers a range of varyingcomplexities and pinpoint sizes.

Since for these ontologies no provenance information is avail-able, we have generated artificial provenance in two ways:

random – ontologies 1–6 We have augmented the originalontologies by randomly assigning time-stamp values anddegrees of trustworthiness to the axioms.cluster – ontologies 7–12 We have augmented the originalontologies by assigning similar timestamp values and degrees

3 Pellet: OWL 2 Reasoner for Java (http://clarkparsia.com/pellet).4 The OWL API (http://owlapi.sourceforge.net/).

of trustworthiness to such clusters of axioms which are syntac-tically relevant to each other. This reflects the fact that a userusually does not do random modifications, but changes a partof the ontology focused around a certain class or property.

As the ontologies 1 to 12 are inconsistent, our implementationfocuses on pinpoints for inconsistencies instead of subsumptions.For this reason, the termination condition in line 8 of Prov hasbeen changed to a consistency check, i.e. M-A � :A.

For each ontology, we have measured the time needed to com-pute the provenance for an inconsistency in average over all incon-sistent classes of an ontology. In order to investigate the influenceof composing dimensions, we have performed the experiment foreach ontology with a single and with two provenance dimensions(see Section 5.2).

We use as baseline the naïve approach based on full axiom pin-pointing the black box algorithm implementation in the OWL APIfor computing all pinpoints.

The results of the evaluation are presented in Fig. 6 and Table 5.In Fig. 6, we have normalized the results to the processing time ofthe naïve approach and used a logarithmic scale. The absolutenumbers ranged over three orders of magnitude. Our optimized al-gorithm performs significantly better for all cases and scales verywell. Note that in some ontologies we can find some very large pin-points and some entailments with high numbers of pinpoints. Incontrast, some other ontologies contain relatively small and fewpinpoints.

As shown in Fig. 6 ontology 5, which is an annotated version ofthe Chemical ontology, required the most computation time in thenaïve approach as it contains some very large pinpoints and someentailments with high numbers of pinpoints. Specially for suchcases, our approach has shown its potential.

As already expected, the optimized cluster ontology group(ontologies 7–12) has performed better than the optimized ran-dom ontology group (ontologies 1–6). The reason is that for the op-timized cluster group we have assigned similar provenance labelsto syntactically related axioms. Since in our approach, axioms thatare syntactically relevant to the query are selected first, for the op-timized cluster group the module could be built faster than for therandom group. This increases the performance for computing pro-venance since axioms belonging to a common pinpoint are in gen-eral syntactically related to each other. As instance, ontology 10has shown better performance as ontology 4 since the relevant ax-ioms could be selected first, and thus a smaller module could bebuilt.

Furthermore, we observe that debugging with many indepen-dent dimensions can have a negative performance impact due tolots of incomparable labels in the composed dimension, but inmost cases still performs significantly better repeating the compu-tation for two atomic dimensions.

8.2. Experiment II – living ontologies

In our second experiment, we evaluate if our approach is fastenough to support user assessments of the reliability of inferences

Page 12: Using provenance to debug changing ontologies

Table 5Average time to compute provenance for an inconsistency of an ontology (ms).

Ontology Baseline Av. size of pinpoints perunsat. class

Provenance datedimension

Av. size of module perunsat. class

Provenance date and trustdimensions

Av. size of module perunsat. class

1 Peoplerandom

312 4 141 15 141 15

2 MiniTambisrandom

141 7 18 22 22 22

3 Universityrandom

53 4 8 18 8 18

4 Economyrandom

395 3 23 57 27 58

5 Chemicalrandom

3239 7 32 126 38 126

6 Transportrandom

1165 5 23 70 29 71

7 People cluster 141 4 16 15 16 158 MiniTambis

cluster127 7 11 22 11 22

9 Universitycluster

52 4 7 18 7 18

10 Economycluster

386 3 4 19 5 19

11 Chemicalcluster

3236 7 32 126 37 126

12 Transportcluster

1121 5 28 99 34 102

The bold values highlight the best result for each row.

Table 6Ontologies used in the experiments.

Ontology Expressivity Totalaxioms

No. unsat.classes

Av. of pinpointspro unsat. class

13 BIO SHOIN (D) 660.915 1 114 OBI SHOIN (D) 16.415 1 1

Fig. 6. Average time to compute provenance for an inconsistency of an ontology (in logarithmic scale). The computation of provenance for ontology 5, 10 and 13 scales ordersof magnitude better than a naïve implementation algorithm and for that reason, they are not displayed on the graph.

S. Schenk et al. / Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298 295

in real time also for real world, large scale data. For thisexperiment, we have used the two real-world ontologies shownin Table 6.

The BIO ontology is a mapping ontology for the Open Biomedi-cal Ontologies (BIO) repository of the US National Cancer InstitutesCenter for Bioinformatics [5], ontology 13. The BIO ontology comeswith real change logs containing timestamps and editor informa-tion for each axiom, which we use to evaluate our approach.

The OBI ontology is the Ontology for Biomedical Investigations,5

ontology 14. This is an integrated ontology for the description of life-science and clinical investigations. It supports the consistent annota-tion of biomedical investigations, regardless of the particular field ofstudy. The OBI ontology has been developed in collaboration withgroups representing different biological and technological domainsinvolved in Biomedical Investigations. In the OBI ontology, changesare tracked in change logs with timestamps and provenanceinformation.

For both ontologies, we mapped information about editors todegrees of trustworthiness as defined in Section 3.3, and extractedthe timestamps for the modification dates.

5 OBI: The Ontology for Biomedical Investigations (http://obi-ontology.org/)

.

We compared our approach to the naïve approach for comput-ing all pinpoints. For each ontology, we have evaluated the timeneeded to compute the provenance for an inconsistency in averageover all inconsistent classes of an ontology.

Since ontology 13 and 14 are consistent, we have generatedrandom queries as follows: For randomly selected disjoint classesA and B we introduced a new class C such that A v C and B v Cand used these new axioms as queries. For this reason, we coulduse exactly the same implementation as for Experiment I. We haveperformed the experiment for each ontology with a single and withtwo dimensions to investigate the influence of composingdimensions.

The results of the evaluation are presented in Fig. 6 and Table 7.For ontology 13 and 14 our algorithm is 97–99% faster than thebaseline. For the BIO ontology, which contains 660.915 axioms,

Page 13: Using provenance to debug changing ontologies

Table 7Absolute times needed for the experiments (ms).

Ontology Baseline Av. Size of pinpoints prounsat. class

Provenance datedimension

Av. size of module prounsat. class

Provenance date and trustdimensions

Av. size of module prounsat. class

13 BIO 1.273.281 3 7514 33 7306 3314 OBI 24.474 3 660 150 717 150

The bold values highlight the best result for each row.

Table 8Average of the relevant axioms for deriving provenance vs the average of the modulesize.

Ontology Averageaxioms

Average modulesize

Precision

1 People random 4 15 0.32 MiniTambis

random7 22 0.3

3 University random 4 18 0.24 Economy random 3 57 0.055 Chemical random 7 126 0.066 Transport random 5 70 0.077 People cluster 4 15 0.38 MiniTambis cluster 7 22 0.39 University cluster 4 18 0.210 Economy cluster 3 19 0.1611 Chemical cluster 7 126 0.0612 Transport cluster 5 102 0.0513 BIO 3 33 0.0914 OBI 3 150 0.02

The bold values highlight the best result for each row.

296 S. Schenk et al. / Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298

provenance can be computed in under 8 seconds with our algo-rithm, which is fast enough for interactive applications. In contrast,the baseline approach takes almost 22 min.

The missing bars for ontology 13 in Fig. 6 are due to the extremedifferences in runtime. For the relatively small ontologies used forexperiments 1 to 12, existing pinpointing algorithms perform pret-ty well. However, for the extremely large ontologies 13 and 14, ouroptimizations show their full potential.

Using our approach, the time needed to compute provenancemainly scales with the size of pinpoints, rather than with the sizeof the ontology. Compared to the size especially of ontology 13, ourmachine had a rather small main memory (most of it was alreadynecessary to just classify the ontology using Pellet). We expecteven better performance, when memory management is less anissue.

Debugging with many independent dimensions can have a ne-gative performance impact due to lots of incomparable labels inthe composed dimension. Results for the real world data fromthe BIO and the OBI ontologies show that in reality this is less aproblem, as modification dates and editors are not independentthere. In fact, the lack of independence of the provenance dimen-sions is an advantage in practice. The results for ontology 13 areeven better for the complex dimension than for the atomic dimen-sions. This shows that our assumption for the clustered randomdata obviously is correct.

8.3. Precision and recall

Finally, we have analyzed the precision and recall of our ap-proximation. Precision is defined here as the number of retrievedaxioms that are relevant for deriving provenance divided by the to-tal number of retrieved axioms. Recall is defined as the number ofretrieved axioms that are relevant for deriving provenance dividedby the total number of available axioms of the ontology that are re-levant for deriving provenance.

The recall clearly always is 1.0, as assured by the inner loop inlines 5 to 9 in Prov and ProvPartial. We retrieve all axioms ofthe pinpoint(s) that are relevant and necessary for computing theprovenance label (see Fig. 5). However, we have to tolerate a cer-tain percentage of false positives (low precision), since not all re-trieved axioms are relevant for deriving provenance.

Table 8 shows that for ontologies 1–3 and 7–9 our approachleads to high precision. However, for the ontologies 4–6, 10–12,as well as for both real-world ontologies 13 and 14, our approachachieves low precision. The reason for this discrepancy is thatthe latter ontologies are highly axiomatized ontologies with manysyntactic dependencies among axioms. Thus, using the syntacticrelevance function our algorithm also retrieves many irrelevantones. As we have mentioned before, we can trade the runtime op-timization in the inner loop of the algorithm for higher precision ofthe module. In other words, we have a trade-off between (low) pre-cision and (good) performance.

Table 8 shows the relation between the size of the module re-trieved by the algorithm Prov presented in Section 6.2 and the to-tal number of relevant axioms for deriving provenance. Wemeasure the average module size for computing the provenanceand the average number of axioms in all relevant pinpoints for

an inconsistency. The numbers are averaged over all inconsistentclasses of an ontology.

9. Related work

Initial work towards the approach presented in this paper hasbeen published in [31]. We extend the work in [31] with an opti-mized algorithm for computing provenance and a comprehensiveevaluation. Moreover, the composition of multiple provenance di-mensions and comparisons to other approaches has been addedand the formalization has been extended correspondingly.

McGuinness and Pinheiro da Silva [26] propose infrastructurefor trust on the Semantic Web such as portable explanations andservice registries. While [26] and our approach can augment eachother, ours is significantly different, because we provide a flexibleframework for arbitrary provenance dimensions. Furthermore, wedo not need to generate accurate proofs to track provenance.

In [6] the authors propose an approach for computing bound-aries for reasoning in sub-ontologies to enforce access restrictions.Provenance is used as criterion for pre-computing sub-ontologiesduring the development phase where the consequences still followfor pre-determined axioms. Two optimizations for reasoning areproposed: An extension of the hitting set tree (HST) algorithm forgeneral lattices, which is close to state of the art pinpointing algo-rithms and binary search for total orders. For large ontologies, [6]uses a modularization step, before applying the actual reasoning.The evaluations of [6] and our approach are not directly compar-able, as [6] focuses on either large ontologies with low expressivityðELþÞ or small ontologies with high expressivity. Moreover, themodularization step is not included in the evaluation. [22] extendsthe approach towards explanations and debugging of access rights.In contrast, our approach is built on modularization at its corewithout a need for preprocessing and generates explanations whilecomputing provenance. We have shown that our approach scalesvery well even for large and very expressive ðSHOINðDÞÞ ontolo-gies. Besides a single access rights dimension, we discuss the man-agement of the various provenance dimensions relevant for the

Page 14: Using provenance to debug changing ontologies

S. Schenk et al. / Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298 297

tracking of ontology dynamics. The two approaches might bejoined in future work by using the HST algorithm from [6] to re-place the naïve algorithm used in ProvPartial. Such a combinedalgorithm would have desirable features of both approaches.

Fokoue et al. [12] introduces a trust framework based on Baye-sian Description Logics that allows to compute a degree of incon-sistency over a probabilistic knowledge base. They considerpinpoints as possible worlds for an axiom and derive for each pos-sible word a probability measure. The degree of inconsistency of aknowledge base is then computed as the sum of the probabilitiesassociated with possible worlds that are inconsistent. Due to scal-ability reasons, the proposed trust computation model operates ona random sample of justifications. Our optimized algorithm doesnot deal with the sum of all justifications, and thus we do not needto compute all justifications to determine the degree of inconsis-tency for a query. We are interested in deriving the provenance la-bels of the changes which led to the inconsistency.

In [32], the authors propose a framework which allows to selectthe most probable diagnosis (pinpoint) to repair an inconsistency.Basically, it creates a sequence of queries which is used to reducethe set of diagnosis until it identifies the target one. The target diag-nosis is the one which contains all entailments of the target ontology.To select the best queries to be posted next, the algorithm predictsthe information gain of a query result, i.e. the result which minimizesthe entropy. Our approach has a different goal when using prove-nance information for reasoning. Since we do not aim to computethe most probable possible world over all possible worlds but wewant to select some specific axioms based on their provenance, wedo not need to compute all pinpoints to track provenance.

Further related work can be grouped into the following cate-gories: (i) Extensions of description logics with a particular prove-nance dimension, especially uncertainty. (ii) General provenancefor query answering with algebraic query languages. (iii) Exten-sions of description logics with general provenance and (iv) prove-nance for other logical formalisms.

ad (i) Several multi-valued extensions of description logic havebeen proposed: Lukasiewicz and Straccia [24] propose fuzzy andprobabilistic extensions of the DLs underlying the web ontologylanguage OWL. [28] describe an extension towards a possibilisticlogic. Another extension towards multi-valued logic is presentedby Schenk [30]. They aim at trust and paraconsistency instead ofuncertainty. OWL-2 is extended to reasoning over logical bilattices.Bilattices, which reflect the desired trust orders, are then used forreasoning. The approach allows for paraconsistent reasoning withOWL taking into account trust levels. Ma et al. [25] provide an ex-tension to reasoning in OWL with paraconsistency by reducing it toclassical DL reasoning. On the level of RDF, [11] ‘‘color’’ triples totrack information sources. All of these approaches have in commonthat they modify the character of models in the underlying descrip-tion logic, e.g. to fuzzy or possibilistic models. In our approach incontrast, we reason on a meta level: while the underlying modelremains unchanged, we compute consequences of annotations onaxioms. This meta level reasoning is not possible in the approachesproposed above. Unlike general provenance, these approaches aretailored to a specific need and most of them do not respect the pro-venance dimensions needed for tracking ontology dynamics. In [4],the authors propose a temporal extension of description logicswhich is a combination of the epistemic modal logic S5 with thestandard DL ALCQI . S5ALCQI that can represent rigid conceptsand roles and allows one to state that concept and role member-ships change in time(but without discriminating between changesin the past and future). This approach weakens the temporal di-mension to the much simpler S5, but can nevertheless show thatadding change (i.e., timestamping on entities, relationships and at-tributes) pushes the complexity of ALCQI from ExpTime-com-plete to 2-ExpTime-hard. Likewise in [3] the authors propose a

temporal extension of description logic T}DL� Litebool that can ad-ditionally capture some form of evolution constraints. Both worksaim at analysing the satisfiability problem (decidable or undecid-able) for temporal extensions of DL. Since we do not work withany temporal extension of DL, our problem is restrict to the under-lying logic.

ad (ii) Provenance for algebraic base languages has been pro-posed by various authors, for example for the Semantic Web QueryLanguage SPARQL [10,16,23] and for relational databases [9]. Whilethe actual provenance formalisms are comparable to ours, the un-derlying languages are of lower expressivity, typically Datalog. Pro-venance in these languages can directly be evaluated using the treeshaped algebraic representations of a query, which is not possiblein description logics.

ad (iii) Tran et al. [33] propose a provenance extension of OWL,which is also based on annotation properties. Even though prove-nance can be expressed in ways comparable to ours, it has a ratherad-hoc semantics, which may differ from query to query. In our ap-proach, provenance and classical reasoning take place in parallel.Hence, we can answer queries such as ‘‘Give me all results with aconfidence degree of Px’’. In contrast, reasoning on the ontologyand meta level is separated in [33]. As a result, [33] allows forqueries such as ‘‘Give me all results, which are based on axiomswith a confidence degree of Px’’. Although this difference mightseem quite subtle, depending on the provenance dimension thesequeries may have dramatically different results.

ad (iv) [8] propose an extension of Datalog with weights, whichare based on c-semirings and can be redefined to reflect variousnotions of trust and uncertainty. Our provenance dimensions aresimilar to c-semirings, but additionally allow to handle conflictingprovenance using a third operator. C-semirings have been investi-gated in great detail and have some desirable properties, such asthe fact that the cartesian product of two c-semirings again is ac-semiring. Although based on a less strictly defined algebraicstructure, our composition of provenance dimensions describedin Section 5.2 follows a similar idea.

10. Conclusion

When querying and reasoning while integrating knowledgefrom different sources on the Semantic Web, applications need totrack the provenance of axioms in living ontologies. Users need toolsupport for judging the usability and trustworthiness of ontologicaldata, as well as engineering tools supporting the collaborative de-velopment of living ontologies. In dynamic ontologies ever chan-ging criteria may be necessary to establish trust or locate bugs.

We have defined provenance as a very flexible and extensibleframework for expressing information about evolving ontologies.At the same time, provenance has a clear semantics for reasoningand debugging with such dynamic ontologies. We have shown thatthis additional information does not only provide value to the enduser. Further, it can be used to significantly speed up debuggingprocesses by rapidly approximating a solution.

In this paper we present our framework restricted to ontologydiagnosis scenarios. Our work, however, introduces an approachfor provenance querying under a variety of scenarios (that considermany of the dimensions of provenance) such as to restrictions ofaccess rights, knowledge validity when the truth of knowledgechanges with time, and inferring trust value.

In future work, we will apply our approach to provenance toother logical formalisms beyond DL SRIQðDÞ.

References

[1] A. Kalyanpur et al. Axiom Pinpointing: Finding (precise) Justifications forArbitrary Entailments in OWL-DL, Technical report, UMIACS 2006, 2006.

Page 15: Using provenance to debug changing ontologies

298 S. Schenk et al. / Web Semantics: Science, Services and Agents on the World Wide Web 9 (2011) 284–298

[2] B. Thomas Adler, L. de Alfaro, A content-driven reputation system for thewikipedia, WWW, ACM, 2007. pp. 261–270.

[3] A. Artale, R. Kontchakov, V. Ryzhikov, M. Zakharyaschev, in: Dl-Lite withTemporalised Concepts, Rigid Axioms and Roles, FroCoS, Springer, 2009, pp.133–148.

[4] A. Artale, C. Lutz, D. Toman, in: A Description Logic of Change, IJCAI, MorganKaufmann Publishers Inc, 2007, pp. 218–223.

[5] B. Smith et al., The obo foundry: coordinated evolution of ontologies tosupport biomedical data integration, Nature Biotech. 25 (11) (2007) 1251–1255.

[6] F. Baader, M. Knechtel, R. Peñaloza, A generic approach for large-scaleontological reasoning in the presence of access restrictions to the ontology’saxioms, ISWC, LNCS, Springer, 2009. pp. 49–64.

[7] F. Baader, R. Peñaloza, Axiom pinpointing in general tableaux, J. Logic Comput.20 (1) (2010) 5–34.

[8] S. Bistarelli, F. Martinelli, F. Santini, in: A Semantic Foundation for TrustManagement Languages with Weights: An Application to the RT Family, ATC,Springer, 2008, pp. 481–495.

[9] P. Buneman, S. Khanna, W. Chiew Tan, Why and where: a characterization ofdata provenance, ICDT, LNCS, Springer, 2001. pp. 316–330.

[10] R. Dividino, S. Sizov, S. Staab, B. Schueler, Querying for provenance, trust,uncertainty and other meta knowledge in rdf, J. Web Semantics 7 (3) (2009)204–219.

[11] G. Flouris, I. Fundulaki, P. Pediaditis, Y. Theoharis, V. Christophides, Coloringrdf triples to capture provenance, ISWC, LNCS, Springer, 2009. pp. 196–212.

[12] A. Fokoue, M. Srivatsa, R. Young, Assessing trust in uncertain information,ISWC, LNCS, Springer, 2010. pp. 209–224.

[13] J. Golbeck, Inferring reputation on the semantic web, WWW, ACM, 2004.[14] T.J. Green, G. Karvounarakis, V. Tannen, Provenance semirings, PODS, ACM,

2007. pp. 31–40.[15] H. Halpin, Provenance: the missing component of the semantic web, SPOT,

CEUR, 2009.[16] O. Hartig, Querying trust in rdf data with tsparql, ESWC, LNCS, Springer, 2009.

pp. 5–20.[17] Pascal Hitzler, Markus Krötzsch, Bijan Parsia, Peter F. Patel-Schneider,

Sebastian Rudolph (eds.), OWL 2 Web Ontology Language: Primer, W3C

Recommendation, 27 October 2009. Available at <http://www.w3.org/TR/owl2-primer/>.

[18] M. Horridge, B. Parsia, U. Sattler, Laconic and precise justifications in owl,ISWC, LNCS, Springer, 2008. pp. 323–338.

[19] I. Horrocks, O. Kutz, U. Sattler, The even more irresistible sroiq, in: PatrickDoherty, John Mylopoulos, Christopher A. Welty (Eds.), KR, AAAI Press, 2006,pp. 57–67.

[20] Q. Ji, G. Qi, P. Haase, A relevance-directed algorithm for finding justifications ofdl entailments, ASWC, LNCS, Springer, 2009. pp. 306–320.

[21] A. Kalyanpur, B. Parsia, M. Horridge, E. Sirin, Finding all justifications of owl dlentailments, ISWC/ASWC, LNCS, Springer, 2007. pp. 267–280.

[22] M. Knechtel, R. Peñaloza, A generic approach for correcting access restrictionsto a consequence, ESWC, LNCS, Springer, 2010. pp. 167–182.

[23] N. Lopes, A. Polleres, U. Straccia, A. Zimmermann, Anql: sparqling up annotatedrdfs, ISWC, LNCS, Springer, 2010. pp. 518–533.

[24] T. Lukasiewicz, U. Straccia, Managing uncertainty and vagueness in descriptionlogics for the semantic web, J. Web Semantics 6 (4) (2008) 291–308.

[25] Y. Ma, P. Hitzler, Z. Lin, Algorithms for paraconsistent reasoning with OWL,ESWC, LNCS, Springer, 2007. pp. 399–413.

[26] D.L. McGuinness, P. Pinheiro da Silva, Infrastructure for web explanations, in:ISWC, IEEE Computer Society, 2003, pp. 113–129.

[27] B. Motik, Representing and querying validity time in rdf and owl: a logic-basedapproach, ISWC, LNCS, Springer, 2010. pp. 550–565.

[28] G. Qi, J.Z. Pan, Q. Ji, Extending description logics with uncertainty reasoning inpossibilistic logic, in: ECSQARU, Springer, 2007, pp. 828–839.

[29] R. Reiter, A theory of diagnosis from first principles, Artificial Intelligence 32(1) (1987) 57–95.

[30] S. Schenk, On the semantics of trust and caching in the semantic web, ISWC,LNCS, 5313, Springer, 2008. pp. 533–549.

[31] S. Schenk, R. Dividino, S. Staab, Reasoning with provenance, trust and all thatother meta knowledge in owl2, SWPM, CEUR, 2009.

[32] K. Shchekotykhin, G. Friedrich, Query strategy for sequential ontologydebugging, ISWC, LNCS, Springer, 2010. pp. 696–712.

[33] D.T. Tran, P. Haase, B. Motik, B. Cuenca-Grau, I. Horrocks, MetalevelInformation in ontology-based applications, in: AAAI, AAAI Press, 2008, pp.1237–1242.


Recommended