+ All Categories
Home > Documents > Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at:...

Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at:...

Date post: 14-Dec-2015
Category:
Upload: tyrone-anthony
View: 215 times
Download: 0 times
Share this document with a friend
170
Giorgos Flouris Open Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: http://www.ics.forth.gr/~fgeo/Publications/ WOD13 .p pt Giorgos Flouris [email protected] Open Data Tutorials, May 2013
Transcript
Page 1: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 1

Data and Knowledge Evolution

Slides available at: http://www.ics.forth.gr/~fgeo/Publications/WOD13.ppt

Giorgos [email protected]

Open Data Tutorials, May 2013

Page 2: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 2

World Wide Web

WWW (and HTML) focus on human readability

Page presentation (fonts, colors, images, …)Human understandingPresentation Semantical contentContent is not formally described (for a machine to understand)

WWW contains documents, not data

Page 3: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 3

Problems with the Current Web

Search and access becomes difficult

Software ignorant of the semantical content of a web pageKeyword searchHigh recall, low precision

Terminological issues

Synonyms (heart disease = cardiac disease)Hyponyms/hypernyms (parliament members are politicians)

Queries on the semantical content cannot be made

Fetch articles that support B. Obama’s foreign policyFetch the home pages of all members of the Greek Parliament

Page 4: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 4

Semantic Web

The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation[BLHL01]

The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries http://www.w3.org/2001/sw/

[Semantic Web] is a collaborative effort led by W3C with participation from a large number of researchers and industrial partnershttp://www.w3.org/2001/sw/

Page 5: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 5

Semantic Web in Practice

Web of data, rather than documents

HTML for presentationSemantical languages for semantical contentReadable and understandable by humans and machines

Semantic Web languages, protocols, etc

Web page annotation (metadata descriptions etc)Publication of data on the InternetEfficient communication and manipulation of data over the Internet

Different applications

Efficient searchingSharing of data (e-science, e-government, remote learning, …)Linked Open Data (more on that later)

Page 6: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 6

Ontologies and Data (Datasets)

An ontology is an explicit specification of a shared conceptualization of a domain [Gru93]

Precise, logical account of the intended meaning of termsCommon (shared) interpretation of termsFormal vocabulary for information exchange (humans/machines)

Ontologies (vocabularies) allow the description of data

Terminology:

Ontology = vocabulary = schemaData = instancesDataset = data and the related ontology (i.e., a dataset may contain

schema and/or data)

Page 7: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 7

Dataset Dynamics

Datasets change constantly

World changes (dynamic models)View on the world changes (new knowledge, measurements, etc)Perspective and usage changes

Example:

Gene Ontology (information about gene products): daily versionsDBPedia: 1,4 updates/second (http://live.dbpedia.org/LiveStats/) [MLA+12]

Need methodologies to cope with the problems related to dynamicity

Evolution (modify a dataset in response to a change)Versioning (keep track of versions and their relations)Debugging, cleaning, repairing, quality (maintain consistency and quality

in a dynamic environment)Change monitoring, detection and propagation (identify changes and use

them to synchronize remote datasets)…

Page 8: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 8

Linked (Open) Data

Datasets can be interlinked

Sharing knowledgeReusing knowledgeModular developmentReuse of schemas

Linked Open Data (LOD) movement

Constantly growing31 billion triples and 295 datasets as of September 2011

Page 9: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 9

Linked Open Data Cloud Diagram

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Page 10: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 10

Linked Open Data Challenges

Both a blessing and a curse

Added-value benefitsDiscovery of unknown correlations, connections, relationshipsVast amount of interrelated knowledge

No central control, everyone can publish and relate to othersQuality of datasets lies/depends on different providersA change in one dataset affects all related ones

Several new problems related to dynamics

Propagation of changes among interrelated datasetsMaintaining the quality of local datasetsCo-evolution

Page 11: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 11

Scope: Dynamic Linked Datasets

Dynamic Datasets

LinkedDatasets

You are here

Page 12: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 12

Purpose of This Talk

To survey different research areas related to dynamic LOD

Remote Change ManagementRepairData and Knowledge Evolution

Categorize and classify works in each field

Broad but shallow descriptionSeveral references for more in-depth studyNo claims of completeness (references are just indicative)Two relevant surveys: [FMK+08, ZAA+13]

Emphasis on some related work done in FORTH

Will avoid technical discussionReferences will be given for further details

Page 13: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 13

Defining Remote Change Management

Managing the effects of remote changes on interlinked datasets

Remote changes have profound effects on local datasetsGood practices are important

—Proper versioning, change logging, adaptation to remote changes, …

Attention exploded after the success of the LOD paradigm

Related research questions

How should I version my data?How can I efficiently monitor changes in my dataset?How can I detect changes in remote datasets? How does the evolution of remote datasets affect my data? How can I efficiently propagate changes from one dataset to

another?

Page 14: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 14

Rem

ote

Site

Remote Change Management: Visualization

RD0 RD1

Versioning, Change Monitoring

Lo

cal Site

LD0

Change Detection

Change Propagation

LD1

Page 15: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 15

Remote Change Management: Structure

Three subfields

VersioningChange monitoring and detectionChange propagation

Structure

Introduction, definition of subfieldsLiterature reviewAn approach for change detection [PFF+13]

Page 16: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 16

Defining Repair

Assessing and improving the quality and the semantical or structural integrity of the data

Maintaining consistency, coherency, validityRestoring consistency, coherency, validity, when violatedAssessing and improving qualityPreserve quality/integrity in the face of remote changes

Related research questions

How can I preserve the integrity and quality of my data in a dynamic and interlinked environment?

How can I guarantee consistency and validity?How can I restore consistency and validity, if violated?

Page 17: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 17

Repair: Visualization

D0 D1

Repair Process(Cleaning, Debugging,

Repairing, Quality Enhancement)

Assessment Module (Diagnosis, Quality Assessment)

Page 18: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 18

Repair: Structure

Four subfields

CleaningDebuggingValidity repairQuality enhancement

Structure

Introduction, definition of subfieldsLiterature reviewAn approach for validity repair [RFC11]

Page 19: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 19

Defining Evolution

Modifying a dataset in response to a change in the domain or its conceptualization

Identify the result of applying new information on the datasetDetermine the result of change propagation from remote datasetsUnderstand the process of change

Related research questions

What is the semantics of evolution and change? How can I efficiently compute the ideal evolution result?

Page 20: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 20

Evolution: Visualization

Dataset

Real World

EvolutionAlgorithm

Delete_Class(…)Pull_Up_Class(…)Rename_Class(…)

D0 D1

Page 21: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 21

Evolution: Summary

Evolution topics

Understanding the evolution challengesUnderstanding the process of change

—Balancing between philosophical and practical considerations

Cross-fertilization with belief change

Structure

Introduction, connection with belief changeUnderstanding the process of changeLiterature review

Page 22: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 22

General Structure of this Talk

A.Introduction to RDF/S, DLs, OWL

B.Remote change management

1. Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]

C.Repair

1. Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]

D.Data and Knowledge Evolution

1. Introduction, connection with belief change2.Understanding the process of change3.Literature review

The final few slides contain citations for the references in this talk

Part IPart I(2 hours)(2 hours)

Part IIPart II(1 hour)(1 hour)

Page 23: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 23

Talk Structure (A)

A.Introduction to RDF/S, DLs, OWL

B.Remote change management

1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]

C.Repair

1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]

D.Data and Knowledge Evolution

1.Introduction, connection with belief change2.Understanding the process of change3.Literature review

Page 24: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 24

Datasets

Basic structures

Classes (or concepts): collections of objects (e.g., Actor, Politician)

Properties (or roles): binary relationships between objects (e.g., started_on, member_of)

Instances (or individuals): objects (e.g., Giorgos, B. Obama)

Relations between them

Subsumption (Parliament_Member subclass of Politician), instantiation (B. Obama instance of Politician), …

The allowed relations and their semantics depend on the language

Different representation languages for LOD

RDF/S, OWL

Page 25: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 25

Visualization, Triples, Serialization

Period

Actor Event

OnsetExisting

Stuff Birth

started_on

participants

Define classes[Period type Class]Define properties[participants type Property][participants domain Onset][participants range Actor]Instantiate/define individuals[G_Birth type Birth][Giorgos type Actor][G_Birth participants Giorgos]Define hierarchies[Event subClass Period]

G_BirthGiorgosparticipants

<rdfs:Class rdf:ID=“Period”> </rdfs:Class> <rdf:Property rdf:ID=“participants”> <rdfs:domain rdf:resource=“Onset”/> <rdfs:range rdf:resource=“Actor”/> </rdf:Property> <G_Birth rdf:about Birth><participants><Giorgos rdf:about Actor/></participants></G_Birth><rdfs:Class rdf:ID=“Event”> <rdfs:subClassOf rdf:resource=“Period”/> </rdfs:Class>

Visualization Triple Representation Serialization (RDF/XML)

instantiation

subsumption

Page 26: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 26

RDF and RDFS

An RDF dataset consists of triples

RDFS adds semantics

Subsumption hierarchies (classes and properties)—Transitive

Instantiation—Inheritance, implicit instantiation

Sometimes more than subsumption/instantiation is needed

Combining concepts, roles to form more complex relations—Concept definitions: a mother is a female who has a child—Other knowledge: all items stored in warehouse X are flammable

Constraints on data—Each person must have one mother

Page 27: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 27

Extensions of RDF/S: DLs (1/2)

Description Logics (DLs)

http://dl.kr.org/Formal underpinning of web representation languagesFamily of logical formalisms

—Well-defined semantics—Model-theoretic reasoning based on interpretations

Formally studied —Expressiveness, reasoning tools, computational complexity, …

Components

Individuals: specific objects (instances) – GiorgosConcepts: sets of individuals (classes) – ParentRoles: sets of pairs of individuals (properties) – has_child

Operators: , , ⊓ , {.}, , …⊤

Connectives: , ≡, …⊑

Page 28: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 28

Extensions of RDF/S: DLs (2/2)

Definitions, partial definitions, constraints, subsumptions, …

A mother is a female who has a child—Mother ≡ has_child Female⊓

Each person must have one mother—Person ⊑ has_child-1.Mother

A great variety of DLs (trade-off involved)

Different propertiesDifferent expressive powerDifferent reasoning complexity

Page 29: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 29

Extensions of RDF/S: OWL

OWL (Web Ontology Language)

http://www.w3.org/2004/OWL/General-purpose representation languageCompatible with the architecture of the Semantic Web

A family of languages

Flavors: OWL-Lite, OWL-DL, OWL FullProfiles: OWL 2 EL, OWL 2 QL, OWL 2 RLDifferent expressiveness (and complexity)

Each corresponds to a specific DL

Useful from a modeling perspectiveExpressive but not too complexAppealing computationally

Page 30: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 30

Representation Languages in LOD

Mostly RDF

With RDFS semantics —Instantiations

—Class subsumption

—Property subsumption is rare

Some OWL

Mostly OWL LiteExtensive use of owl:sameAs

—Often abusing it [HHM+10]

OWL 2 profiles are gaining ground

Page 31: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 31

Talk Structure (B1)

A.Introduction to RDF/S, DLs, OWL

B.Remote change management

1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]

C.Repair

1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]

D.Data and Knowledge Evolution

1.Introduction, connection with belief change2.Understanding the process of change3.Literature review

Page 32: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 32

Motivation for Remote Change Management

Crucial problem for dynamic linked datasets

Linking: datasets linked to other datasets (e.g., vocabularies)Dynamics: changes cause problems to linked datasetsNo central curation or control

—No control over (or knowledge of) other datasets’ evolution processCurators don’t bother annotating and logging changes

—Temporal and versioning information is usually missing [RPH+12]

Remote change management seeks solutions to allow:

Keeping track of versionsRestoring previous versionsAssessing compatibility of versionsMonitoring and detecting changesTracing back the evolution history (of datasets, concepts, …)

—For visualization and understandingPropagating changes to synchronize linked datasets

DR

DL

uses

Page 33: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 33

Subfields of Remote Change Management

Remote Change Management

Versioning—Keep track of versions

Change monitoring and detection—Monitoring: record changes as they happen

—Detection: identify changes after they happen

Change propagation—Propagate changes across linked datasets for synchronization purposes

Page 34: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 34

VersioningVersioning

Keep track of versionsIdentify different versions of a datasetEnable transparent access to the “correct” version (smooth interoperation)

Issues involvedIdentification

—Determine which versions to store and how to identify them—Manually or automatically (syntactical, semantical considerations)—Packaging of changes

Relation between versions —A sequence or a tree

Compatibility information—Backwards/forwards compatibility and how to determine it (often manually)—Dataset-wide compatibility or fine-grained compatibility (e.g., at resource level)—Metadata on the different versions

Transparent access—Relate versions with (compatible) data sources, applications etc

Page 35: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 35

Change Monitoring and Detection

Change monitoring

Record changes as they happen—Manual (error-prone and often incorrect)

—Automatic (not used in practice)

In the good will of the dataset ownerSometimes change logs are inaccessible

Change detection

Identify changes after they happenBased on the previous and current versions

In both cases, a change language is required

Supported set of changes, along with their semanticsCan be low-level or high-level

DR

DL

uses

Page 36: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 36

Change Propagation

Change propagation

Communicate changes to linked datasets for synchronization

Push-based or pull-based propagation

Push-based: locally-initiated, via “registration” or via monitoring and versioning

Pull-based: consumer-initiated

Communication based on deltas (rather than versions)

Reduce communication overheadReduce storage requirementsOn average, 2-3% of a dataset changes between versions [OK02]

Deltas are based on a language of changes

Page 37: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 37

Talk Structure (B2)

A.Introduction to RDF/S, DLs, OWL

B.Remote change management

1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]

C.Repair

1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]

D.Data and Knowledge Evolution

1.Introduction, connection with belief change2.Understanding the process of change3.Literature review

Page 38: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 38

Versioning Approaches (1/3)

Capture different aspects of versioning, such as:

Detecting versionsStoring versions efficientlyAllow cross-snapshot queries

—Find gene products whose functions have not changed in the last 50 versions—Determine price fluctuation for x along different versions of the product catalog

Early versioning approaches inspired by SVN

Good for files, not directly adaptable to semantical languagesSHOE language [HH00]

Machine-readable version information (e.g., compatibility)Provided by curator as SHOE statements

Memento [SSN+10]

Fine-grained versioning at URI level (resources, web pages)Machine-readable version information, in the HTTP header

—Timestamps, traversal information (prior/current versions) etc

Page 39: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 39

Versioning Approaches (2/3)

Theoretical foundations for versioning [HP04]

Formal definitions to capture notions such as:—Compatibility (between versions)

—Commitment (resources committing to a certain ontology)

—Ontology perspectives (the part of the web committing to an ontology)

Temporal approaches [HS05, PTC05, KLGE07]

For capturing temporal relations between versionsFor allowing cross-snapshot queries

Versioning in multi-editor environments [RSDT08]

Via change monitoring

Page 40: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 40

Versioning Approaches (3/3)

Automatically detecting version relationships [AAM09]

Using heuristics based on URIs

Study of “relatedness” between versions [CQ13]

A model of “relatedness” between vocabularies from various sources

Similar to links in web pages

POI: Partial Order Index [TTA08]

Efficient method for storing versions and their differencesStores several versions, exploiting their common triples for

efficient storage

Page 41: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 41

Change Languages (1/2)

Change languages necessary for monitoring, detection, propagation

Granularity

Low-level (or atomic, or elementary)—Simple add/remove operations

—Add(s,p,o), Delete(s,p,o)

—Simple to detect and define

—Focus on machine-readability: determinism, well-defined semantics

High-level (or complex, or composite)—More coarse-grained, compact, closer to editor’s perception and intuition

—Generalize_Domain(P,A), Delete_Class(A)

—More interesting; harder to detect and define

—Focus on human-understandability: often unclear and/or informal semantics

Page 42: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 42

Change Languages (2/2)

Many different high-level languages (no standard)

[HGR12, JAP09, PFF+13, SK03, AH06, DA09, PTC07, …]Some are domain-specific (e.g., [HGR12])Some are dynamic (e.g,, [AH06, DA09, PTC07])

—Allow custom, user-defined changes

Some allow terminological changes (e.g., [PFF+13])—Rename, merge, split

—Common, but tough to detect (easily confused with add/delete)

Page 43: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 43

Representation Issues

Deltas are just sets of changes from the change language

Changes usually represented using a change ontology

Ontology represents changesA specific change is an instance of such an ontologyDeltas associated with sets of such instancesDifferent proposals [NCLM06, KFKO02, KN03, PT05]Allows the manipulation and communication of deltas/changes

using standard Semantic Web technologies

Page 44: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 44

Change Monitoring Approaches

Using a version log [PT05]

Logging actions on the datasetUse it for change detection, as well as proper versioningGood quality, high-level change monitoringBased on a dynamic language of changes

Using migration specifications [ZZL+03]

Similar to logs, but with a more formal structure

DBPedia change monitoring [MLA+12]

http://live.dbpedia.org/Live versions, as opposed to “standard” versions

Page 45: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 45

Low-Level Change Detection (1/2)

SemVersion [VWS+05]

Developed in Karlsruhe (FZI, AIFB)Low-level change detection tool for RDFProvides also versioning functionalitiesAllows cross-snapshot queries

For RDF [ILK12]

Low-level change detection based on set differenceAggregating and compressing deltas Also dealing with versioning issues

For RDF/S [ZTC11]

Takes into account semantics (RDFS inference)Four different methods to compute deltas (all based on set difference)Formal analysis of these methods’ properties and semanticsExtension: effect of blank nodes on change detection [TLZ12]

Page 46: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 46

Low-Level Change Detection (2/2)

Bubastis (http://www.ebi.ac.uk/fgpt/sw/bubastis/index.html)

Simple diff tool (triple-based comparison)Basically RDF, but also supports OWL

For DL-Lite [KWZ08]

Formal, semantical approach

For EL [KWW08]

Uses a concept-based description of changes

For propositional knowledge bases [FMV10]

Propositional, but generic; it can be applied to DLsFormal analysis of the problemAlso dealing with propagation semantics

Page 47: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 47

High-Level Change Detection (1/2)

For OWL: PromptDiff [NKKM04], OntoView [KFKO02]

Employ heuristics and probabilistic methodsEvaluation using precision/recall metrics against a gold standardIntegrated into tools that also provide versioning functionalities

For RDF/S [PFF+13]

Dealing with both machine-readability and human-understandability

Also dealing with propagation (applying changes)To be discussed in detail later

COnto-Diff [HGR12]

Rule-based approachAlso dealing with propagation

Page 48: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 48

Change Propagation Approaches

Usually part of other tools [SMMS02, MMS+03]

Versioning, monitoring tools (push-based propagation)Detection tools (pull-based propagation)Evolution and repair tools (pull-based propagation)

—Adapt your data to be “compatible” with the new remote version

SparqlPush [PM10]

Push-based propagation of changes on SPARQL “views”PRISM, PRISM++ [CMZ08, CMDZ10]

High-level language of schema changes for relational data—Also supports changes on the integrity constraints

Identifies and propagates the changes required in the data for abiding to the new schema

Query and update rewriting —For applications that try to access the old schema

Page 49: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 49

Other Change Management Approaches

Complete approach for XML [SP10]

Representing changes inline with the data using a graph (“evograph”)

Supports different change representation languages (both low-level and high-level)

Timestamps changesMonitoring: evograph can be used to log the changesPropagation: changes can be accessed and propagatedVersioning: timestamps in changes can be used to generate

snapshots (versions) at different timesAllows cross-snapshot queriesFairly generic, can be adapted for RDF

Page 50: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 50

Talk Structure (B3)

A.Introduction to RDF/S, DLs, OWL

B.Remote change management

1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]

C.Repair

1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]

D.Data and Knowledge Evolution

1.Introduction, connection with belief change2.Understanding the process of change3.Literature review

Page 51: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 51

Our Approach on Change Detection

Purpose of this work: change detection [PFF+13]

A posteriori detect the differences (delta or diff) between versions in a concise, intuitive and correct way

Main design choices

Change detection based on a general-purpose high-level languageHuman-understandable, but also machine-readableClear, formal semanticsProvable formal properties and functionality guaranteesDetection and application (propagation) semantics

V1 V2 V3 V4 V5

C1 C2 C3 C4

Page 52: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 52

Sample Evolution

Persistent

Event

Onset Birth

Stuff

Actor

started_on

participants

Version 1 (V1) Version 2 (V2)

Period

Actor Event

OnsetExisting

Stuff Birth

started_on

participants

G_BirthGiorgosparticipants

instantiation

subsumption

instantiation

subsumption

G_BirthGiorgosparticipants

Evolution

Page 53: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 53

Analyzing the Evolution (Using Triples)

Triples in V1 (partial list)

[Event type Class]

[Period type Class]

[Event subclass Period]

[participants type Property]

[participants domain Onset]

[participants range Actor]

[Giorgos type Actor]

[Existing type Class]

[Stuff subclass Existing]

[started_on domain Existing]

[Onset subclass Event]

[Birth subclass Onset]

Triples in V2 (partial list)

[Event type Class]

[participants type Property]

[Event domain participants]

[participants range Actor]

[Giorgos type Actor]

[Persistent type Class]

[Stuff subclass Persistent]

[started_on domain Persistent]

[Onset subclass Event]

[Birth subclass Event]

Page 54: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 54

Low-Level Delta

Triples in V2 but not in V1

(added triples)

[Event domain participants]

[Persistent type Class]

[Stuff subclass Persistent]

[started_on domain Persistent]

[Birth subclass Event]

Triples in V1 but not in V2

(deleted triples)

[Period type Class]

[Event subclass Period]

[participants domain Onset]

[Existing type Class]

[Stuff subclass Existing]

[started_on domain Existing]

[Birth subclass Onset]

Persistent

Event

Onset Birth

Stuff

Actor

started_on

participants

Version 1 (V1) Version 2 (V2)

Period

Actor Event

OnsetExisting

Stuff Birth

started_on

participants

G_BirthGiorgosparticipants

instantiation

subsumption

instantiation

subsumption

G_BirthGiorgosparticipants

Evolution

Persistent

Event

Onset Birth

Stuff

Actor

started_on

participants

Version 1 (V1) Version 2 (V2)

Period

Actor Event

OnsetExisting

Stuff Birth

started_on

participants

G_BirthGiorgosparticipants

instantiation

subsumption

instantiation

subsumption

instantiation

subsumption

instantiation

subsumption

G_BirthGiorgosparticipants

Evolution

Low-Level DeltaAdd([Event domain participants])

Add([Persistent type Class])…

Del([Period type Class])…

Page 55: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 55

Analyzing the Evolution (Visually)

Persistent

Event

Onset Birth

Stuff

Actor

started_on

participants

Version 1 (V1) Version 2 (V2)

Period

Actor Event

OnsetExisting

Stuff Birth

started_on

participants

G_BirthGiorgosparticipants

instantiation

subsumption

G_BirthGiorgosparticipants

Evolution

High-Level DeltaGeneralize_Domain(participants, Onset, Event)

Pull_Up_Class(Birth, Onset, Event)Delete_Class(Period, Ø, {Event}, Ø, Ø, Ø, Ø)

Rename_Class(Existing, Persistent)

Page 56: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 56

Comparing the Deltas

Persistent

Event

Onset Birth

Stuff

Actor

started_on

participants

Version 1 (V1) Version 2 (V2)

Period

Actor Event

OnsetExisting

Stuff Birth

started_on

participants

G_BirthGiorgosparticipants

instantiation

subsumption

G_BirthGiorgosparticipants

Evolution

Del([participants domain Onset])Add([participants domain Event])

Generalize_Domain(participants, Onset, Event)

Del([Birth subclass Onset])Add([Birth subclass Event])

Pull_Up_Class(Birth, Onset, Event)

Low-level delta High-level delta

Del([Period type Class])Del([Event subclass Period])

Delete_Class (Period,Ø,{Event},Ø,Ø,Ø,Ø)

Page 57: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 57

Associations (Partitioning)

Low-Level Changes Associated High-Level Changes

Del([participants domain Onset]) Generalize_Domain(participants, Onset, Event)Add([participants domain Event])

Del([Birth subclass Onset])Pull_Up_Class(Birth, Onset, Event)

Add([Birth subclass Event])

Del([Period type Class]) Delete_Class(Period, Ø, {Event}, Ø, Ø, Ø, Ø)Del([Event subclass Period])

Del([Existing type Class])

Rename_Class(Existing, Persistent)

Del([Stuff subclass Existing])

Del([started_on domain Existing])

Add([Persistent type Class])

Add([Stuff subclass Persistent])

Add([started_on domain Persistent])

Page 58: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 58

Challenges for High-Level Languages

High-level deltas are superior

More concise (e.g., Rename_Class)More intuitive (e.g., Pull_Up_Class)Carry additional information (e.g., Generalize_Domain)

Challenges for high-level languages

Must be deterministic (exactly one high-level delta)Must be fine-grained enough to capture subtle changesMust be coarse-grained enough to be conciseMust be intuitive and close to editor’s perception of the changes

Compatible detection and application algorithms

Intuitive resultsEfficient

Page 59: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 59

Proposed Language L

The formal definition of a change consists of:

Changes required in the low-level delta (added/deleted triples)

Conditions that should hold in V1 and/or V2

Generalize_Domain(P, X, Y)

Del([P domain X])Add([P domain Y])

P existing property in both V1, V2

X, Y existing classes in both V1, V2

X subclass of Y in both V1, V2

Generalize_Domain(participants, Onset, Event): detectable

Similarly for the other changes in L (132 high-level ones)

Page 60: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 60

Types and Number of Defined Changes

Changes(134)

Low-Level (2)

High-Level (132)

Basic(54)

Composite(51)

Heuristic(27)

AddDel

Delete_SubclassDelete_Domain

Pull_Up_ClassChange_Domain

Rename_ClassSplit_Class

Page 61: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 61

Results on L: Granularity

Granularity problem: solved by defining levels of changes

Basic Changes: fine-grained, roughly correspond to low-levelComposite Changes: coarse-grained, group several basic changes

togetherHeuristic Changes: based on heuristics, necessary for Rename,

Merge, Split etc; require mappings between URIs

Problems with determinism

One evolution could correspond to different sets of basic/composite changes

Priorities in detection

Heuristic Composite Basic

Page 62: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 62

Results on L: Determinism

Each low-level change is associated with exactly one detectable high-level change

Full partitioning of low-level changes into high-level ones

Each pair of versions (V1, V2) is associated with:

Exactly one low-level deltaExactly one high-level delta

Determinism is necessary

More than one would lead to ambiguities

Less than one would make some inputs (V1, V2) irresolvable

Page 63: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 63

Results on L: Propagation

Persistent

Event

Onset Birth

Stuff

Actor

started_on

participants

Version 1 (V1) Version 2 (V2)

Period

Actor Event

OnsetExisting

Stuff Birth

started_on

participants

G_BirthGiorgosparticipants

G_BirthGiorgosparticipants

Detect C

Apply C

Apply C-1

Page 64: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 64

Results on L: Deltas Keep Version History

Can reproduce all versions as long as you keep (any) one version and the deltas

Deltas are more concise than the versions themselves

Storage and communication efficiency

V1 V2 V3 V4 V5

C1 C2 C3 C4

Page 65: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 65

Change Detection: Evaluation

Detection and application algorithms implemented for evaluation

Performance

Complexity: O(max{N1,N2,N2})

Performance depends on the detected changes (type, number)Bottleneck: calculating the low-level delta (>80% of total time)

Intuitiveness

Changes in our language are used in practiceResults confirmed by literature/editor notes (CIDOC, GO)Better than CIDOC’s manually recorded changes (18 changes missed)

Conciseness

Basic ≈ Low-LevelBasic + Composite + Heuristic << Low-LevelUp to 80% reduction, depending on the case

Page 66: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 66

Summary and Conclusions: RCM

Remote change management is at the heart of LOD

Uncontrolled character of LOD makes it critical

Various related fields

Versioning, change monitoring and detection, change propagationUnfortunately, not used in practice in LOD

Presented a formal approach for change detection [PFF+13]

Other possible directions (related to LOD)

Best practices should be studied and promoted—Automated versioning and monitoring mechanisms embedded in evolution

tools/editors

—Understand and use temporal and provenance metadata on versions

Improved change monitoring and detection—A standard change language?

Page 67: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 67

Talk Structure (C1)

A.Introduction to RDF/S, DLs, OWL

B.Remote change management

1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]

C.Repair

1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]

D.Data and Knowledge Evolution

1.Introduction, connection with belief change2.Understanding the process of change3.Literature review

Page 68: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 68

Motivation for Repair

Published data is usually problematic

Several different types of problems in LOD [HHP+10]Pedantic web initiative (http://pedantic-web.org/)

—Advice for data owners on how to prevent common problems in their data

Page 69: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 69

Causes of Data Problems

Several reasons for data problems

Erroneous data (faulty sensors, human mistakes etc)Different symbolisms and terminologyModeling errors (e.g., all birds fly)Requirements (constraints) on the data may change

—E.g., when applications’ needs change

Reuse data by different providers (no quality guarantees)Quality jeopardized by re-use and open evolutionIntegration/merging of datasets

Page 70: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 70

Generic Approaches

Four ways to deal with problems in data [HHH+05]

Prevent it (careful evolution, merging etc)—Can only prevent problems caused by changes in the local dataset

Correct it (repair)—Actively address the problem (after it appears)

Ignore it (consistent query answering, non-monotonic reasoning)—CQA: popular in database community; prevents user from noticing the

problem by rewriting queries (common denominator approach)

—NMR: popular in AI community; avoid trivialization of reasoning (paraconsistent reasoning, defeasible reasoning, default reasoning, …)

Use versions (versioning)—Make sure you refer to the correct (compatible) version

—Only when the problem is due to a remote change

Page 71: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 71

Subfields of Repair

Cleaning

Mainly related to literal qualityTerminology, symbols, metric units etc

Debugging

Consistency (at least one model)Coherency (no unsatisfiable classes)Relevant for DL/OWL only

Validity repair

Satisfaction of custom integrity constraints (e.g., business rules)Expressed in OWL, DL, Datalog or predicate logic

Quality enhancement

Assessing and improving the quality of dataDifferent dimensions (timeliness, completeness, reputation, …)

Page 72: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 72

Cleaning

Literals in LOD are often messy, and have to be “cleaned up”

Different formats for names, dates etc—&gf name “Giorgos Flouris” &gf name “Flouris, Giorgos”—&gf birth_date 03/05/76 &gf birth_date 05/03/76—&gf birthplace “Hellas” &gf birthplace “Greece”

Different symbols—Paris land_area 105,4 Paris land_area 105.4—Paris population 2.234.105 Paris population 2,234,105

Different metric units—Paris land_area 105,4 Paris land_area 40,7—&x price 30 &x price 39

Inconsistent values—&x price 0 &x price “free”

Data is not in the desired form (data transformation)—LIP6 addr “4, P. Jussieu” LIP6 street “P. Jussieu” LIP6 streetno 4

Page 73: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 73

Debugging

Coherency

No unsatisfiable classesIndicates good modeling

Consistency

At least one modelAvoids reasoning triviality

Relevant for DL/OWL only

Pengo

Bird

canFly

canFly

Penguin

Horse

hasHorns

hasHorns

Unicorn

Page 74: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 74

Validity Repair

Validity repair

Satisfaction of custom integrity constraints (e.g., business rules)Encode context- or application-specific requirements

—PROV-DM: http://www.w3.org/TR/2013/REC-prov-constraints-20130430/

Applications may be useless over invalid data

Expressed in OWL, DL, Datalog, Datalog±, predicate logic, …

Different expressive powerDifferent semantics (OWA/CWA, UNA) [TSBM10, MHS09]

Various types of constraints

Functional, inverse functional, transitivity, cardinality constraintsDisjointness constraintsPrimary key, foreign key, inclusion constraintsTuple-generating dependencies (tgd), equality-generating dependencies

(egd)

Page 75: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 75

Quality Enhancement

Quality is defined as “fitness for use” [Jur74]

Multi-faceted (timeliness, completeness, reputation, …)Task-dependentSubjective

Assessing quality

Via assessment functions (e.g., [BC09]) or SPARQL queries (e.g., [FH10])

Some kind of combined scoring over the relevant dimensions

Improving (enhancing) quality

Usually manualTries to improve the assessment score

Page 76: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 76

Talk Structure (C2)

A.Introduction to RDF/S, DLs, OWL

B.Remote change management

1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]

C.Repair

1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]

D.Data and Knowledge Evolution

1.Introduction, connection with belief change2.Understanding the process of change3.Literature review

Page 77: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 77

Cleaning Tool: OpenRefine

Open source

Originally developed by google (GoogleRefine)http://openrefine.org/

Applies on various representations of the input data

CSV/TSV, Excel, JSON, XML, RDF as XML, etcRDF extension

Functionalities (related to this talk)

Data exploration and cleaning —Both automated and manual (interface assists in manual cleaning)

Data transformation (format conversion)—Uses GREL (Google Refine Expression Language) and regular expressions

Page 78: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 78

Cleaning Tool: ODCleanStore

Web application, written in Java

Developed by Charles University (Prague)http://www.ksi.mff.cuni.cz/~knap/odcs/sections/odcs.html

Functionalities (related to this talk)

Cleaning—Via “transformers” (policies for cleaning)

—Expressed using SPARQL or regular expressions

Quality assessment—Transformer assigns a score to data

Validity repair—Supports conflict resolution for functional properties

—Decides what to drop based on the quality of the data items involved

—Supports aggregation functionalities based on “aggregation policies”

Page 79: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 79

Other Cleaning Approaches

Involve users in the loop [KHS12]

Manual requests for improvements (cleaning, quality, …)Patch Request Ontology (PRO)Use a GWAP (Game With A Purpose) for identifying data problems

Page 80: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 80

Debugging: Literature Overview

Identify and resolve inconsistency/incoherency

Two phases

Diagnosis: identify inconsistency/incoherencyRepair: remove inconsistency/incoherency

Literature mostly dealing with diagnosis

Repair requires additional user inputDiagnosis is more than reasoning

Pinpoint the causes of inconsistency/incoherencyRepair

User input required (manual or semi-automatic approaches)Automatic approaches also require user input or domain

knowledge (ad-hoc solutions)

Page 81: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 81

Debugging Approaches

Diagnosis using tableau-based algorithms for various DLs

Identify minimal sets of responsible axioms—[SC03, MLBP06, PT06, WHR+05]

Identify responsible parts of axioms (more fine-grained)—[KPS+06, LPSV06]

Repair

Manual: editors and related tools—Onion [MWK00], PROMPT [NM00], Chimaera [MFRW00]

Semi-automatic—Interactive approach via suggestions: ORE tool [LB10]

Automatic: —Using external information, e.g., for stratified datasets [QP07, MLB05]

Page 82: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 82

Validity Repair: Literature Overview

Identify and resolve invalidity (custom constraints)

Two phases

Diagnosis: identify invalidityRepair: remove invalidity

Literature mostly dealing with diagnosis

Repair requires additional user inputDiagnosis is more than validation

Pinpoint the causes of invalidityRepair

User input required (manual or semi-automatic approaches)Automatic approaches also require user input or domain

knowledge (ad-hoc solutions)

Page 83: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 83

Validity Repair Approaches

Not much work in repairing custom constraints in LOD

A large body of related work for the relational setting—For various constraint types and repair methodologies

Existing tools

Stardog (http://www.stardog.com/docs/)—Commercial RDF database that supports validation of custom constraints

Rondo (relational/XML) [Mel04]—Repair based on a fixed “importance” of data items

Declarative repairing based on preferences [RFC11]—To be discussed in detail later

Repairing functional properties ([FRPV+12], Sieve [MMB12])

Page 84: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 84

Data Quality Frameworks (1/4)

Many different quality assessment methodologies and frameworks

Several different quality dimensionsDifferent works consider different dimensionsDifferent proposals for their classification and organization

There is no single, generally accepted data quality framework

Cannot be oneDifferent applications have different needs

Page 85: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 85

Data Quality Frameworks (2/4)

Quality dimensions, quality indicators, scoring functions and assessment metrics [BC09]

Different quality dimensions—Timeliness, completeness, reputation, …

Each dimension associated with different indicators—Timeliness: last modification date, creation date, …

Each indicator associated with different scoring functions —E.g., days since last update

Scoring functions from relevant indicators are combined using assessment metrics

—E.g., Reputation_value*0,6 + days_since_update*0,4

Page 86: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 86

Data Quality Frameworks (3/4)

[RH09]

Page 87: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 87

Data Quality Frameworks (4/4)

[ADA98]

Page 88: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 88

Talk Structure (C3)

A.Introduction to RDF/S, DLs, OWL

B.Remote change management

1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]

C.Repair

1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]

D.Data and Knowledge Evolution

1.Introduction, connection with belief change2.Understanding the process of change3.Literature review

Page 89: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 89

Our Approach on Validity Repair

Declarative approach for validity repair [RFC11]

Main design choices

Both diagnosis and repairApplicable for RDF/SAdopted relational semantics (CWA) for the constraintsGenerality on the supported constraints (DEDs)Minimal user interaction (all info provided at input)Automatic diagnosisAutomatic repair using preferences (provided by the user at input)

Page 90: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 90

RDF/S Representation Model

Express RDF/S over an adequate relational schema

Hybrid method—C_IsA(A,B): A is a subclass of B

—C_Inst(x,A): x is an instance of A

—Domain(P,A): the domain of P is A

—…

Alternatives

Schema-specific—One table/predicate for each class/property (A(x), B(x), P(x,y), …)

—Not amenable to changes (e.g., delete class)

Schema-agnostic (triple-store)—One table with three columns (spo)

—Harder to define constraints, less intuitive

Page 91: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 91

Allowed Constraints

Considered a very general class of constraints

Disjunctive Embedded Dependencies (DEDs) [Deu09]

Very general class

Functional, inverse functional, transitivity, cardinality constraintsDisjointness constraintsPrimary key, foreign key, inclusion constraintsTuple-generating dependencies (tgd), equality-generating

dependencies (egd)

Page 92: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 92

Constraints

Express validity constraints over the aforementioned schema:

Class subsumption must be acyclic x,y C_IsA(x,y) C_IsA(y,x) ⊥

Correct classification in property instances x,y,p,a P_Inst(x,y,p) Domain(p,a) C_Inst(x,a) x,y,p,a P_Inst(x,y,p) Range(p,a) C_Inst(y,a)

Closed World Assumption (CWA)

Failure to prove something, is a proof for its negation

Syntactical manipulations on constraints allow

Diagnosis —Finding violated constraints

Repair —Identifying repairing options per violation

Page 93: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 93

Dataset D0

Class(Sensor), Class(SpatialThing), Class(Observation)Prop(geo:location)Domain(geo:location,Sensor)Range(geo:location,SpatialThing)Inst(Item1), Inst(ST1)P_Inst(Item1,ST1,geo:location)C_Inst(Item1,Observation), C_Inst(ST1,SpatialThing)

Repairing Example

Correct classification in property instancesx,y,p,a P_Inst(x,y,p) Domain(p,a) C_Inst(x,a)

Sensor SpatialThing

Observation

Item1 ST1

geo:location

Schema

Data

Item1 geo:location ST1 Sensor is the domain of geo:locationItem1 is not a Sensor

P_Inst(Item1,ST1,geo:location)D0

Remove P_Inst(Item1,ST1,geo:location)

Add C_Inst(Item1,Sensor)Remove Domain(geo:location,Sensor)

C_Inst(Item1,Sensor)D0

Domain(geo:location,Sensor)D0

geo:location

Page 94: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 94

Preferences for Repair

Which repairing option is best?

Data owner determines that via preferences

Preferences

Specified beforehandHigh-level “specifications” for the ideal repairServe as “instructions” to determine the preferred (optimal)

solution

Page 95: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 95

Preferences (On Datasets)

D0

D2

D3

Score: 3

Score: 4

Score: 6

D1

Page 96: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 96

Preferences (On Deltas)

D0

D2

D3

Score: 2

Score: 1

Score: 5

D1

-P_Inst (Item1,ST1, geo:location)

+C_Inst (Item1,Sensor)

-Dom (geo:location,

Sensor)

Page 97: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 97

More Details on Preferences

Preferences on datasets are result-orientedConsider the quality of the repair resultIgnore the impact of repairPopular options: prefer newest/trustable information, prefer a

specific schema structurePreferences on deltas are impact-oriented

Consider the impact of repairIgnore the quality of the repair resultPopular options: minimize schema changes, minimize

addition/deletion of information, minimize delta sizeProperties of preferences

Quality metrics can be used for stating preferencesMetadata on the data can be used (e.g., provenance)Can be qualitative or quantitative

Page 98: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 98

Generalizing the Approach

For one violated constraint

1. Diagnose invalidity

2. Determine minimal ways to resolve it

3. Determine and return preferred solution based on the preference

For many violated constraints

Problem becomes more complicated More than one resolution steps are required

Issues:

1. Resolution order

2. When and how to filter non-optimal solutions?

3. Constraint (and resolution) interdependencies

Page 99: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 99

Constraint Interdependencies

A given resolution may:

Cause other violations (bad) Resolve other violations (good)

Optimal resolution unknown ‘a priori’

Cannot predict a resolution’s ramifications Exhaustive, recursive search required (resolution tree)

Two ways to create the resolution tree

Globally-optimal (GO) / locally-optimal (LO) When and how to filter non-optimal solutions?

Page 100: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 100

Resolution Tree Creation (GO)

– Find all minimal resolutions for all the violated constraints, then find the optimal ones

– Globally-optimal (GO)

Find all minimal resolutions for one violation

Explore them all Repeat recursively until valid Return the optimal leaves

Optimal repairs (returned)

Page 101: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 101

Resolution Tree Creation (LO)

– Find the minimal and optimal resolutions for one violated constraint, then repeat for the next

– Locally-optimal (LO)

Find all minimal resolutions for one violation

Explore the optimal one(s) Repeat recursively until valid Return all remaining leaves

Optimal repair (returned)

Page 102: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 102

Comparison (GO versus LO)

Characteristics of GO

ExhaustiveLess efficient:

large resolution treesAlways returns optimal repairsInsensitive to constraint syntaxDeterministic (result does not

depend on resolution order)

Characteristics of LO

GreedyMore efficient:

small resolution treesMay return sub-optimal repairsSensitive to constraint syntaxNon-deterministic (result may

depend on resolution order)

Page 103: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 103

Repair: Generality Results

The approach is very general

Thanks to the generality/flexibility of preferences

Repair approaches can be captured using adequately designed preferences

Using either the LO or the GO strategyAll the current approaches that we checkedPractically all future ones

—This has been proved, under some general conditions regarding the behavior of the repair approach

Our model can be viewed as a general approach engulfing other repair approaches

Page 104: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 104

Repair: Algorithms and Complexity

Implemented both algorithms

Detailed complexity analysis for GO/LO and various different types of constraints and preferences

Inherently difficult problem

Exponential complexity (in general)Main exception: LO is polynomial (in special cases)

Theoretical complexity is misleading as to the actual performance of the algorithms

Page 105: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 105

Performance in Practice

Performance in practice

Linear with respect to dataset sizeLinear with respect to tree size

—Types of violated constraints (tree width)

—Number of violations (tree height) – causes the exponential blowup

—Constraint interdependencies (tree height)

—Preference (for LO): affects pruning (tree width)

Further performance improvement

Use optimizationsUse LO with restrictive preferenceCurrently considering a redesign for further improvement

Page 106: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 106

Summary and Conclusions: Repair

Data usually problematic

Different types of problems

Repair is done using different approaches depending on the type of the problem

Cleaning, debugging, repairing, quality assessment and enrichment

Presented a formal approach for validity repair [RFC11]

Other possible directions (related to LOD)

Most approaches detect problems, but don’t resolve themEfficiency problems (for repairing algorithms)Exploit external knowledge on the cause of the problem (e.g.,

propagation of invalidity by a linked dataset)

Page 107: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 107

Talk Structure (D1)

A.Introduction to RDF/S, DLs, OWL

B.Remote change management

1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]

C.Repair

1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]

D.Data and Knowledge Evolution

1.Introduction, connection with belief change2.Understanding the process of change3.Literature review

Page 108: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 108

Motivation for Evolution

Reasons for evolution

New observations or experimentsChange in the viewpoint or usage of the datasetNewly gained access to information (previously classified,

unknown or otherwise unavailable)Incomplete or inaccurate conceptualizationChanges in the world itselfRepairingChange propagation (cascading evolution in LOD)

Not an LOD-specific problem

But critical for LOD as well

Page 109: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 109

Definition of Evolution

The process of modifying a dataset in response to a change in the domain or its conceptualization

Dealing with both data and schema changes

NewData/Knowledge

EvolutionAlgorithm

OriginalDataset

ModifiedDataset

Page 110: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 110

Evolution: Setting the Scope

Evolution is an overloaded term

Phases of evolution

Six phases in [SMMS02], five phases in [PT05]Detecting the need for evolution, change propagation, logging

changes, versioning etc

Scope: apply the change and compute the new dataset

Out of scope: deciding on the change, evaluating the result, managing versions, logging changes etc

Page 111: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 111

Explaining Evolution (1/4)

King

ChessPiece

Plastic Wooden

White Black

Red

Data Level

Schema Level

Chess DatasetRepresentation Language: RDF

Change: Add([King rdf:type Red])

Page 112: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 112

Explaining Evolution (2/4)

King

ChessPiece

Plastic Wooden

White Black

Red

Data Level

Schema Level

Chess DatasetRepresentation Language: RDF

Change: Del([King rdf:type Black])

Is the King Wooden?

Page 113: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 113

Explaining Evolution (3/4)

King

ChessPiece

Plastic Wooden

White Black

Red

Data Level

Schema Level

Chess DatasetRepresentation Language: RDF

Change: Del([King rdf:type Wooden])

Some domain knowledge required(extra-logical considerations)

Page 114: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 114

Explaining Evolution (4/4)

King

ChessPiece

Plastic Wooden

White Black

Red

Data Level

Schema Level

Chess DatasetRepresentation Language: OWL

Wooden and Plastic are disjoint[Wooden owl:disjointClass Plastic]

Change: Add([King rdf:type Plastic])

Is the King Black?Is the King Wooden?

disjoint

Page 115: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 115

Side-effects in Evolution

Changes should not undermine the “quality” of the dataset

Side-effects: additional changes that need to be applied along with the original change to maintain knowledge integrity and quality

Consistency, coherency, custom constraints, quality metrics, …

Main challenge in determining the evolution result

Determining side-effects

Page 116: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 116

Determining Side-effects

Challenges in determining side-effects

Evolution result not always obvious (even for humans)—Understand the process of change

—Various philosophical considerations involved

Selection involved (extra-logical considerations)—Domain expertise

—Preferences (trust, provenance, axiom “strength” or “entrenchment”)

Early evolution approaches rather naïve in this respect

Ignored such issues or addressed them in an ad-hoc manner

Page 117: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 117

Belief Change

Belief change (often referred to as belief revision)

The process of modifying a knowledge base in the face of new, possibly contradictory knowledge

Mature, well-established fieldFocuses for logical formalisms (propositional, first-order logic)Recent survey on belief change [FH11]

Aims to understand the process of change

The philosophical/logical counterpart of dataset evolutionCan provide solutions and inspiration

Page 118: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 118

Cross-Fertilization with Belief Change

Cross-fertilization beneficial [Flo06, FPA05, FPA06]

Benefits

Similar problemsDifferences on the underlying intuitions are minimalBelief change field more matureFrame problems and provide inspiration towards a solutionProtect from pitfallsAvoid “reinventing the wheel”

Problems

Representation languages and formalisms are differentAssumptions regarding the underlying representation language

—These assumptions do not hold for LOD representation languages

Can reuse the ideas, not the results themselves

Page 119: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 119

Talk Structure (D2)

A.Introduction to RDF/S, DLs, OWL

B.Remote change management

1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]

C.Repair

1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]

D.Data and Knowledge Evolution

1.Introduction, connection with belief change2.Understanding the process of change3.Literature review

Page 120: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 120

Challenges and Considerations

List of challenges and problems related to evolution

As well as some answers from the belief change field

Challenges and the complexity of formalisms

Some of the problems do not appear in simpler formalisms (RDF)Some of the problems are only relevant in the presence of schema

—Data changes are simpler (on a fixed schema)

Part of the discussion only relevant for DL, OWL

Page 121: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 121

Importance of Implicit Data (Example)

King

ChessPiece

Plastic Wooden

White Black

Red

Data Level

Schema Level

Chess DatasetRepresentation Language: RDF

Change: Del([King rdf:type Black])

Is the King Wooden?

Page 122: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 122

Importance of Implicit Data

Explicit and implicit data equally important

The coherence viewpointKing is WoodenThe closure of the dataset is

considered during changes—Belief set semantics

Implicit data persistent—Explicit support not necessary for

implicit data

No discrimination—No need to distinguish explicit

data from implicit

—Redundant data can be deleted

Explicit data more important than implicit

The foundational viewpointKing is not WoodenOnly explicit knowledge is

considered during changes—Belief base semantics

Implicit data volatile—Retained only as long as there is

explicit support

Discrimination—Explicit data should be explicitly

marked as such

—Redundant data should persist

Page 123: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 123

Redundant Data

King

ChessPiece

Plastic Wooden

White Black

Red

Data Level

Schema Level

Chess DatasetRepresentation Language: RDF

Change: Add([King rdf:type Black])

Page 124: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 124

The King Is Black

King

ChessPiece

Plastic Wooden

White Black

Red

Data Level

Schema Level

Chess DatasetRepresentation Language: RDF

Observation: the King is Black

Change: Add([King rdf:type Black])

Is the King Wooden?

Page 125: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 125

Paint It Black

King

ChessPiece

Plastic Wooden

White Black

Red

Data Level

Schema Level

Chess DatasetRepresentation Language: RDF

Action: King is painted Black

Change: Add([King rdf:type Black])

Is the King Wooden?

Page 126: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 126

Static and Dynamic Worlds

Same dataset, same change, but different expected result

Different semantics between the two cases [KM91]Different operations

Static world change semantics

The world does not change, but our perception of it changes Modeling or conceptualization problems, new observation etc

Dynamic world change semantics

The world changes, and we need to keep ourselves up-to-dateNo problems with the original conceptualization

Page 127: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 127

Types of Operations

Static world

Revision (add)Contraction (delete)

Dynamic world

Update (add)Erasure (delete)

Plus some more (forget, expansion, …)

Less well-studiedIgnored for this talkIrrelevant for LOD or trivial

Static Dynamic

Addition Revision Update

Deletion Contraction Erasure

Page 128: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 128

Example: Revision and Contraction

King

ChessPiece

Plastic Wooden

White Black

Red

Data Level

Schema Level

Chess DatasetRepresentation Language: OWL

Change #1I believe that the King is not BlackAdd([King rdf:type NotBlack],[NotBlack owl:complementOf Black])

Change #2I do not believe that the King is BlackDel([King rdf:type Black])NotBlack

Page 129: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 129

Expressing the Change

Different paradigms for expressing the change

Modification-based—“Add([King rdf:type NotBlack], [NotBlack owl:complementOf Black])”

—The exact modifications that should be applied to accommodate the new knowledge

—Must know the conceptualization

—Closer to the ontology expert

Fact-based—“I believe that the King is not Black”

—A new fact that should be accommodated in the dataset

—Extra layer of abstraction (extra step required to determine modifications)

—Closer to the domain expert

Handling multiple changes

Iterated belief changePackage versus choice semantics (contraction and erasure)Merging

Page 130: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 130

Evolution Principles (Partial List)

Principle of Success (Primacy of New Information)

New information is unconditionally acceptedNon-prioritized belief change

Principle of Validity (Consistency Maintenance)

Belief change: usually logical consistencyLOD evolution: consistency, coherency, custom constraints, …

Principle of Minimal Change

Determine the side-effects that have minimal impact —But satisfying the other principles

Corresponds to the selection processMinimality depends on the task, context, user, application, …Different postulates and intuitions (recovery, relevance etc)Different metrics (model-based, formula-based, cardinality etc)

Page 131: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 131

Understanding the Principles

King

ChessPiece

Plastic Wooden

White Black

Red

Data Level

Schema Level

Chess DatasetRepresentation Language: OWL

Wooden and Plastic are disjoint[Wooden owl:disjointClass Plastic]

Change: Add([King rdf:type Plastic])

Invalidity (basically, inconsistency)The King is both Wooden and Plastic

Three options (Minimal Change)

disjoint

Page 132: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 132

Non-obvious Side-effects

King

ChessPiece

Plastic Wooden

White Black

Red

Data Level

Schema Level

Chess DatasetRepresentation Language: ALC DL

I don’t believe that all White items are Chess_Pieces

Replace subsumptions with:

White Chess_Piece Plastic⊓ ⊑Plastic White Chess_Piece⊑ ⊔

Plastic

Chess_Piece

WhiteWhite

Page 133: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 133

Talk Structure (D3)

A.Introduction to RDF/S, DLs, OWL

B.Remote change management

1.Introduction, definition of subfields2.Literature review3.An approach for change detection [PFF+13]

C.Repair

1.Introduction, definition of subfields2.Literature review3.An approach for validity repair [RFC11]

D.Data and Knowledge Evolution

1.Introduction, connection with belief change2.Understanding the process of change3.Literature review

Page 134: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 134

Classes of Belief Change Approaches (1/2)

Postulates (one set for each operation)

Formalize the principles, using logical conditionsEssentially define the properties of a rational change operator

—Some principles not considered or given varying semantics—Principle of Minimal Change is the most controversial

Do not uniquely define an operator—A class of operators (expected rational results)—Extra-logical considerations would determine the actual result—Operator-specific (preferences, axiom strength, hard-coded semantics, …)

Belief change context [AGM85, KM91, Han91]Evolution context [FKAC13, WWT10, QLB06a, QLB06b]

Page 135: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 135

Classes of Belief Change Approaches (2/2)

Construction methods

Intuitive constructions for a family of operators of a certain typeRepresentation theorems

—Proof that the constructed family corresponds exactly to the class of operators that satisfy a certain set of postulates

Can be used as “templates” to construct rational change operatorsParameterized selection process

—Preferences, axiom strength, etc

Popular in belief change, not so much in evolution

Explicit algorithms

Implement a specific operator that satisfies some of the postulatesHard-coded or parameterized selection processPopular in evolution context, not so much in belief change

Page 136: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 136

Discussion on the Operator Types (1/2)

Connections between the various operators

Static: revision/contraction interdefinable [AGM85]Dynamic: update/erasure interdefinable [KM91]Model-theoretic characterization of the connection between

static/dynamic worlds (revision-update, contraction-erasure) [KM91]

Postulates critical for establishing those results

Revision and update more useful in practice

Contraction/erasure only used to express agnosticism

Contraction and erasure more interesting from a theoretical perspective

More fundamental operations

Page 137: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 137

Discussion on the Operator Types (2/2)

Revise with φ (in belief change)

Contract ¬φ —This resolves, a priori, any potential inconsistency problems

Add φ (without side-effects)Revise with φ (in LOD)

Contract data that could potentially cause problems—Inconsistency, incoherency, …

Add φ (without side-effects)Contraction is the basis for revision

Simpler operationBasically, if you know how to contract, you know how to reviseMost of the focus in belief change and also in LOD evolution

Same for update/erasure

Page 138: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 138

Evolution via EditorsFeatures

Intuitive interfacesEasy to add/delete triples (but not facts)Some help for determining the side-effects of a change

—Embedded reasoners and/or debugging/repair tools to propose side-effectsAdditional facilities

—Versioning, monitoring, undo/redo, …Main problems

User should be both ontology and domain expertNot applicable in some cases

—Examples: automated agents, time-critical applications, massive streaming inputNo formal properties

ExamplesProtégé (http://protege.stanford.edu/)NeOn toolkit (http://neon-toolkit.org/wiki/Main_Page)OntoStudio (http://www.semafora-systems.com/en/products/ontostudio/)KAON2 (http://kaon2.semanticweb.org/)

Page 139: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 139

Declarative Approaches

SPARQL Update (http://www.w3.org/TR/sparql11-update/)

For RDFFixed semantics, no side-effectsData and schema operations (also bulk changes)

RUL [MSCK05]

For RDF/S, taking into account RDFS semanticsFixed semantics, predefined set of side-effects per operationOnly for data operations (also bulk changes)

EvoPat [RHTA10]

Declaratively associate changes with side-effects (using SPARQL)SPARQL queries determine whether side-effects should be appliedSPARQL update statements represent such side-effects

Tempus fugit [LRV09]

Event-driven, declarative specification of the operators’ semantics

Page 140: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 140

Fixed-Operations Approach

Standard approach in the early days (e.g., [SMMS02])

Set of supported operations (Add_Class, Add_Domain, …)Identify potential problems and side-effects per operation

—Decision is hard-coded or user-defined (from a set of options)

—Example: when deleting a subsumption, how about implicit subsumptions?

Automatic or semi-automatic

Problems

No consensus on the language of changesNo limit on the number of operations

—What about unknown/unsupported operators?

No exhaustive formal analysis of potential side-effectsNo formal properties or other guaranteesIncomplete understanding of the change process

Page 141: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 141

Approaches Inspired by Belief Change (1/2)

Revision in ALU DL [LM04]

Using preferences among axiomsInspired by “epistemic entrenchment”

Revision in generic DLs [QD09]

Three model-based revision operators for DLsEmphasis on the Principle of Irrelevance of Syntax

—Semantical, rather than syntactical, considerations should drive the result

Revision in DL-Lite [GQW12]

Using a graph-based algorithmFor data changes only (Abox)

Update and erasure in RDF/S [GHV06, GHV11]

Taking into account RDFS inferenceUpdate is trivial, erasure is challenging (due to RDFS inference)

Page 142: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 142

Approaches Inspired by Belief Change (2/2)

Using the maxi-adjustment algorithm [MLB05, QLB06a, QLB06b]

Used to repair inconsistencies in propositional knowledge basesRequires a stratification in the knowledgeAdapted for disjunctive DLs

Using kernel operators [Han94]

Kernels: minimal sets of formulas leading to inconsistency —Minimal Inconsistency Preserving Sub-Tboxes (MIPS) [SC03]

OWL [HWK06]DLs [QHHP08]Generic formalisms with no negation (such as RDF) [RW07]

Page 143: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 143

Postulation Approaches in Evolution (1/3)

AGM: dominating paradigm in belief change [AGM85]

The single most influential work in the field of belief changeContributions

AGM postulates: two sets of 6 basic and 2 supplementary postulates

—One set for each operator (revision and contraction)Plus various related results

—Partial meet contraction—Representation theorems—Connections between operators

Only for classical logics (satisfying certain assumptions)

Propositional, first-order, modal logics, …Not for LOD formalisms (RDF/S, DLs, OWL)

Page 144: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 144

Postulation Approaches in Evolution (2/3)

AGM contraction postulates adapted for monotonic logics [Flo06, FPA05, FPA06]

Includes all LOD formalismsBut: no satisfying contraction operator exists for many such logicsCannot find a proper result in certain cases

Necessary and sufficient conditions for the existence of such an operator [FPA06, Flo06]

Negative results for RDF/S, OWL, most DLs [FPA05, RWFA13]

Problem stems from the postulate of recovery [AGM85]

Captures the Principle of Minimal ChangeControversial [Han91]

Page 145: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 145

Postulation Approaches in Evolution (3/3)

Replacing recovery with optimal recovery [FPA06, FHP+06]

Equivalent to recovery for classical logicsBut weaker in generalNot particularly successful either

Replacing recovery with relevance [Han91]

An intuitive, well-established alternative to recoveryEquivalent with recovery for classical logicsApplicable under quite general conditions [RWFA13]

—Applicable for all compact logics

—Includes RDF/S, practically all DLs and OWL flavors and profiles

Adequate for expressing the principles of contraction in LOD languages

Connections with recovery established for non-classical logics

Page 146: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 146

Principle of Adequacy of Representation

Principle of Adequacy of Representation

The evolution result should be expressible in the same formalism as the original dataset

Obvious and trivial

Not always compatible with our requirements for the evolution result

Postulates (e.g., AGM postulates)Specific incarnations of the Principle of Minimal ChangeSpecific computational methods or classes of operators

Two stages for the computation [CGKZ12]

Find the “optimal” evolution result according to the requirementsExpress it in the target language (not always possible)

—Inexpressibility results

Page 147: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 147

Inexpressibility for Classes of Operators

Generic contraction methods [CGKZ12]

Syntactic: remove a minimal set of explicit axiomsFormula-based: remove a minimal set of axioms from the closure

—Three different semantics for minimality

Model-based: modify the model in a minimal manner—Eight different methods to find the “minimal” distance between models

Existing contraction algorithms can be categorized along these generic classes of methods

Different contraction methods not compatible in general (for DLs)

Model-based and formula-based are compatible in classical logics

Inexpressibility results for DL-Lite, EL (i.e., OWL2 QL, OWL2 EL) [CGKZ12]

Proposal: a “hybrid” operator combining ideas from syntactic and formula-based approaches [CGKZ12]

Page 148: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 148

More Inexpressibility Results

DL-Lite evolution [CKNZ10]

Focusing on model-based and formula-based approaches for contractionInexpressibility resultsPropose a formula-based approach

DL revision [LLMW06]

Model-based approach, limited to Abox only (data level)Inexpressibility resultsPropose a new DL that supports model-based evolution

Approximations

DL-LiteF [GLPR07, GLPR09]

—Update and erasure approximation algorithms for data-level changes only

—Alternative: extend DL-LiteF to make sure that result is expressible

DL-Lite [WWT10]—Provide postulates and approximation algorithms for revision

Page 149: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 149

Other ApproachesEvolution using ideas from argumentation frameworks [MRF08]

ALC DLInconsistency in a dataset is an “attack” between argumentsAcceptability semantics used to resolve such attacks and

eliminate inconsistenciesUseful for both debugging and evolution

Evolution can be reduced to debugging/repair [HHH+05]

Apply the changeThen repair the result to resolve problems (Principle of Validity)

—Making sure the change is not “undone” during repair (Principle of Success)

Page 150: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 150

Evolution Under Custom Constraints

Evolution in the presence of custom validity constraints [KFAC07, FKAC13]

Methodology

Apply the change (Principle of Success)Guarantee satisfaction of constraints (Principle of Validity)Use a preference to determine minimality (Principle of Minimal Change)

Features

Generic method, applied for RDF/S evolutionA formal expression of the principles for the proposed settingExhaustive method to determine all possible side-effects and identify the

“best” (according to the preference)Constrain allowed preferences for rationality and performance

Based on similar ideas as the repairing approach of [RFC11]

Page 151: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 151

Summary and Conclusions: Evolution

The problem of evolution is very challenging

Several issues need to be considered—Not obvious to a newcomer

—Often ignored

Evolution approaches

Direct: manual, based on fixed operators, declarativeIndirect: postulation attemptsAdapted: adapting belief change algorithms or methods

Other possible directions (related to LOD)

Adapt for the “linked” character of LOD —Evolution during propagation or after change detection

—Extra knowledge that can be exploited for adapting preferences, fine-tuning of automated algorithms etc

Page 152: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 152

Thank You!

Page 153: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 153

References (1/18)

[AAM09] C. Allocca, M. d'Aquin, E. Motta. Detecting Different Versions of Ontologies in Large Ontology Repositories. IWOD-09, 2009.

[ADA98] M.L. Abate, K.V. Diegert, H.W. Allen. A Hierarchical Approach to Improving Data Quality. Data Quality Journal, 4(1), 1998.

[AGM85] C. Alchourron, P. Gärdenfors, D. Makinson. On the Logic of Theory Change: Partial Meet Contraction and Revision Functions. Journal of Symbolic Logic, 50:510-530, 1985.

[AH06] S. Auer, H. Herre. A Versioning and Evolution Framework for RDF Knowledge Bases. PSI-06, Revised Papers, 2006.

[BC09] C. Bizer, R. Cyganiak. Quality-driven Information Filtering Using the WIQA Policy Framework. Journal of Web Semantics, 7:1–10, 2009.

[BLHL01] T. Berners-Lee, J. Hendler, O. Lassila, The Semantic Web. Scientific American, 2001.

Page 154: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 154

References (2/18)

[CGKZ12] B. Cuenca Grau, E. Kharlamov, D. Zheleznyakov. Ontology Contraction: Beyond the Propositional Paradise. AMW-12, 2012.

[CKNZ10] D. Calvanese, E. Kharlamov, W. Nutt, D. Zheleznyakov. Evolution of DL-Lite Knowledge Bases. ISWC-10, 2010.

[CMDZ10] C.A. Curino, H.J. Moon, A. Deutsch, C. Zaniolo. Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++. PVLDB 4(2):117-128, 2010.

[CMZ08] C.A. Curino, H.J. Moon, C. Zaniolo. Graceful Database Schema Evolution: The PRISM Workbench. PVLDB 1(1):761-772, 2008.

[CQ13] G. Cheng, Y. Qu. Relatedness Between Vocabularies on the Web of Data: A Taxonomy and an Empirical Study. Web Semantics: Science, Services and Agents on the World Wide Web, 2013. Available at: http://dx.doi.org/10.1016/j.websem.2013.02.001

Page 155: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 155

References (3/18)

[Deu09] A. Deutsch. FOL Modeling of Integrity Constraints (Dependencies). Encyclopedia of Database Systems, 2009.

[DA09] R. Djedidi, M. Aufaure. Change Management Patterns (CMP) for Ontology Evolution Process. IWOD-09, 2009.

Page 156: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 156

References (4/18)

[FH10] C. Furber, M. Hepp. Using Semantic Web Resources for Data Quality Management. EKAW-10, 2010.

[FH11] E. Ferme, S.O. Hansson. AGM 25 Years: Twenty-five Years of Research in Belief Change. Journal of Philosophical Logic 40:295-331, 2011.

[FHP+06] G. Flouris, Z. Huang, J.Z. Pan, D. Plexousakis, H. Wache. Inconsistencies, Negations and Changes in Ontologies. AAAI-06, 2006.

[FKAC13] G. Flouris, G. Konstantinidis, G. Antoniou, V. Christophides. Formal Foundations for RDF/S KB Evolution. International Journal on Knowledge and Information Systems, 35(1):153-191, 2013.

[Flo06] G. Flouris. On Belief Change and Ontology Evolution. Ph.D. thesis, University of Crete, 2006.

Page 157: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 157

References (5/18)

[FPA05] G. Flouris, D. Plexousakis, G. Antoniou. On Applying the AGM Theory to DLs and OWL. ISWC-05, 2005.

[FPA06] G. Flouris, D. Plexousakis, G. Antoniou. On Generalizing the AGM Postulates. STAIRS-06, 2006.

[FMK+08] G. Flouris, D. Manakanatas, H. Kondylakis, D. Plexousakis, G. Antoniou. Ontology Change: Classification and Survey. Knowledge Engineering Review, 23(2):117-152, 2008.

[FMV10] E. Franconi, T. Meyer, I. Varzinczak. Semantic Diff as the Basis for Knowledge Base Versioning. NMR-10, 2010.

[FRPV+12] G. Flouris, Y. Roussakis, M. Poveda-Villalon, P.N. Mendes, I. Fundulaki. Using Provenance for Quality Assessment and Repair in Linked Open Data. EvoDyn-12, 2012.

Page 158: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 158

References (6/18)

[GHV06] C. Gutierrez, C. Hurtado, A. Vaisman. The Meaning of Erasing in RDF Under the Katsuno-Mendelzon Approach. WebDB-06, 2006.

[GHV11] C. Gutierrez, C. Hurtado, A. Vaisman. RDFS Update: From Theory to Practice. ESWC-11, 2011.

[GLPR07] G. Giacomo, M. Lenzerini, A. Poggi, R. Rosati. On the Approximation of Instance Level Update and Erasure in Description Logics. AAAI-07, 2007.

[GLPR09] G. Giacomo, M. Lenzerini, A. Poggi, R. Rosati. On Instance-level Update and Erasure in Description Logic Ontologies. Journal of Logic and Computation 19(5):745-770, 2009.

[GQW12] S. Gao, G. Qi, H. Wang. A New Operator for ABox Revision in DL-Lite. AAAI-12, 2012.

[Gru93] T.R. Gruber. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5 (2), 1993.

Page 159: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 159

References (7/18)[Han91] S.O. Hansson. Belief Contraction Without Recovery. Studia Logica

50(2):251-260, 1991.

[Han94] S.O. Hansson. Kernel Contraction. Journal of Symbolic Logic, 59(3):845-859, 1994.

[HGR12] M. Hartung, A. Gross, E. Rahm. COnto-diff: Generation of Complex Evolution Mappings for Life Science Ontologies. Journal of Biomedical Informatics, 2012.

[HH00] J. Heflin, J. Hendler. Dynamic Ontologies on the Web. AAAI-00, 2000.

[HHM+10] H. Halpin, P.J. Hayes, J.P. McCusker, D.L. McGuiness, H.S. Thompson. When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data. ISWC-10, 2010.

[HHH+05] P. Haase, F. van Harmelen, Z. Huang, H. Stuckenschmidt, Y. Sure. A Framework for Handling Inconsistency in Changing Ontologies. ISWC-05, 2005.

[HHP+10] A. Hogan, A. Harth, A. Passant, S. Decker, A. Polleres. Weaving the Pedantic Web. LDOW-10, 2010.

[HP04] J. Heflin, J.Z. Pan. A Model Theoretic Semantics for Ontology Versioning. ISWC-04, 2004.

[HS05] Z. Huang, H. Stuckenschmidt. Reasoning with Multi-version Ontologies: A Temporal Logic Approach. ISWC-05, 2005.

[HWK06] C. Halaschek-Wiener, Y. Katz. Belief Base Revision for Expressive Description Logics. OWLED-06, 2006.

Page 160: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 160

References (8/18)

[ILK12] D.H. Im, S.W. Lee, H.J. Kim. A Version Management Framework for RDF Triple Stores. International Journal of Software Engineering and Knowledge Engineering, 22(1):85-106, 2012.

[JAP09] M. Javed, Y. Abgaz, C. Pahl. A Pattern-based Framework of Change Operators for Ontology Evolution. OTM-09, 2009.

[Jur74] J.M. Juran. The Quality Control Handbook. McGraw-Hill, New York, 1974.

Page 161: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 161

References (9/18)[KFAC07] G. Konstantinidis, G. Flouris, G. Antoniou, V. Christophides. Ontology

Evolution: A Framework and its Application to RDF. SWDB-ODBIS-07, 2007.

[KFKO02] M. Klein, D. Fensel, A. Kiryakov, D. Ognyanov. Ontology Versioning and Change Detection on the Web. EKAW-02, 2002.

[KHS12] M. Knuth, J. Hercher, H. Sack. Collaboratively Patching Linked Data. USEWOD-12, 2012.

[KLGE07] N. Keberle, Y. Litvinenko, Y. Gordeyev, V. Ermolayev. Ontology Evolution Analysis with OWL-MeT. IWOD-07, 2007.

[KM91] H. Katsuno, A.O. Mendelzon. On the Difference Between Updating a Knowledge Base and Revising It. KR-91, 1991.

[KN03] M. Klein, N. Noy. A Component-based Framework for Ontology Evolution. IJCAI-03 Workshop on Ontologies and Distributed Systems, CEUR-WS, vol. 71, 2003.

[KPS+06] A. Kalyanpur, B. Parsia, E. Sirin, B. Cuenca Grau. Repairing Unsatisfiable Concepts in OWL Ontologies. ESWC-06, 2006.

[KWW08] B. Konev, D. Walther, F. Wolter. The Logical Difference Problem for Description Logic Terminologies. IJCAR-08, 2008.

[KWZ08] R. Kontchakov, F. Wolter, M. Zakharyaschev. Can you Tell the Difference Between DL-Lite Ontologies? KR-08, 2008.

Page 162: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 162

References (10/18)

[LB10] J. Lehmann, L. Buhmann. ORE - A Tool for Repairing and Enriching Knowledge Bases. ISWC-10, 2010.

[LLMW06] H. Liu, C. Lutz, M. Milicic, F. Wolter. Updating Description Logic ABoxes. KR-06, 2006.

[LM04] K. Lee, T. Meyer. A Classification of Ontology Modification. AI-04, 2004.

[LPSV06] S.C. Lam, J. Pan, D. Sleeman, W. Vasconcelos. A Fine-grained Approach to Resolving Unsatisfiable Ontologies. WI-06, 2006.

[LRV09] U. Lusch, S. Rudolph, D. Vrandecic. Tempus Fugit: Towards an Ontology Update Language. ESWC-09, 2009.

Page 163: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 163

References (11/18)

[Mel04] S. Melnik. Generic Model Management: Concepts and Algorithms. Springer, 2004.

[MFRW00] D.L. McGuinness, R. Fikes, J. Rice, S. Wilder. An Environment for Merging and Testing Large Ontologies. KR-00, 2000.

[MHS09] B. Motik, I. Horrocks, U. Sattler. Bridging the Gap Between OWL and Relational Databases. Journal of Web Semantics, 7(2):74-89, 2009.

[MLA+12] M. Morsey, J. Lehmann, S. Auer, C. Stadler, S. Hellmann. DBpedia and the Live Extraction of Structured Data from Wikipedia. Program: Electronic library and Information Systems, 46(2):157-181, 2012.

[MLB05] T. Meyer, K. Lee, R. Booth. Knowledge Integration for Description Logics. AAAI-05, 2005.

[MLBP06] T. Meyer, K. Lee, R. Booth, J.Z. Pan. Finding Maximally Satisfiable Terminologies for the Description Logic ALC. AAAI-06, 2006.

Page 164: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 164

References (12/18)

[MMB12] P. Mendes, H. Muhleisen, C. Bizer. Sieve: Linked Data Quality Assessment and Fusion. LWDM-12, 2012.

[MMS+03] A. Maedche, B. Motik, L. Stojanovic, R. Studer, R. Volz. An Infrastructure for Searching, Reusing and Evolving Distributed Ontologies. WWW-03, 2003.

[MRF08] M. Moguillansky, N. Rotstein, M. Falappa. A Theoretical Model to Handle Ontology Debugging and Change through Argumentation. IWOD-08, 2008.

[MSCK05] M. Magiridou, S. Sahtouris, V. Christophides, M. Koubarakis. RUL: A Declarative Update Language for RDF. ISWC-05, 2005.

[MWK00] P. Mitra, G. Wiederhold, M.L. Kersten. A Graph-oriented Model for Articulation of Ontology Interdependencies. EDBT-00, 2000.

Page 165: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 165

References (13/18)

[NCLM06] N. Noy, A. Chugh, W. Liu, M. Musen. A Framework for Ontology Evolution in Collaborative Environments. ISWC-06, 2006.

[NKKM04] N. Noy, S. Kunnatur, M. Klein, M. Musen. Tracking Changes During Ontology Evolution. ISWC-04, 2004.

[NM00] N.F. Noy, M.A. Musen. Prompt: Algorithm and Tool for Automated Ontology Merging and Alignment. In AAAI/IAAI-00, 2000.

[OK02] D. Ognyanov, A. Kiryakov. Tracking Changes in RDF(S) Repositories. EKAW-02, 2002.

Page 166: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 166

References (14/18)

[PFF+13] V. Papavassiliou, G. Flouris, I. Fundulaki, D. Kotzinos, V. Christophides. High-Level Change Detection in RDF(S) KBs. Transactions on Database Systems (TODS), 38(1), 2013.

[PM10] A. Passant, P.N. Mendes. SparqlPuSH: Proactive Notification of Data Updates in RDF Stores Using PubSubHubbub. SFSW-10, 2010.

[PT05] P. Plessers, O. de Troyer. Ontology Change Detection Using a Version Log. ISWC-05, 2005.

[PT06] P. Plessers, O. de Troyer. Resolving Inconsistencies in Evolving Ontologies. ESWC-06, 2006.

[PTC05] P. Plessers, O. de Troyer, S. Casteleyn. Event-based Modeling of Evolution for Semantic-driven Systems. CAiSE-05, 2005.

[PTC07] P. Plessers, O. de Troyer, S. Casteleyn. Understanding Ontology Evolution: A Change Detection Approach. Web Semantics: Science, Services and Agents on the WWW, 2007.

Page 167: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 167

References (15/18)

[QD09] G. Qi, J. Du. Model-based Revision Operators for Terminologies in Description Logics. IJCAI-09, 2009.

[QHHP08] G. Qi, P. Haase, Z. Huang, J.Z. Pan. A Kernel Revision Operator for Terminologies. DL-08, 2008.

[QLB06a] G. Qi, W. Liu, D. Bell. Knowledge Base Revision in Description Logics. JELIA-06, 2006.

[QLB06b] G. Qi, W. Liu, D. Bell. A Revision-based Approach for Handling Inconsistency in Description Logics. NMR-06, 2006.

[QP07] G. Qi, J. Pan. A Stratification-based Approach for Inconsistency Handling in Description Logics. IWOD-07, 2007.

Page 168: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 168

References (16/18)[RFC11] Y. Roussakis, G. Flouris, V. Christophides. Declarative Repairing

Policies for Curated KBs. HDMS-11, 2011.

[RH09] T. Ravn, M. Hoedbolt. How to Measure and Monitor the Quality of Master Data. 2009. Available at: http://www.information-management.com/issues/2007_58/master_data_management_mdm_quality-10015358-1.html

[RHTA10] C. Riess, N. Heino, S. Tramp, S. Auer. EvoPat - Pattern-based Evolution and Refactoring of RDF Knowledge Bases. ISWC-10, 2010.

[RPH+12] A. Rula, M. Palmonari, A. Harth, S. Stadtmüller, A. Maurino. On the Diversity and Availability of Temporal Information in Linked Open Data. ISWC-12, 2012.

[RSDT08] T. Redmond, M. Smith, N. Drummond, T. Tudorache. Managing Change: An Ontology Version Control System. OWLED-08, 2008.

[RW07] M.M. Ribeiro, R. Wassermann. Base Revision in Description Logics – Preliminary Results. IWOD-07, 2007.

[RWFA13] M.M. Ribeiro, R. Wassermann, G. Flouris, G. Antoniou. Minimal Change: Relevance and Recovery Revisited. AI Journal (to appear), 2013.

Page 169: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 169

References (17/18)[SC03] S. Schlobach, R. Cornet. Non-Standard Reasoning Services for the

Debugging of Description Logic Terminologies. IJCAI-03, 2003.

[SMMS02] L. Stojanovic, A. Maedche, B. Motik, N. Stojanovic. User-driven Ontology Evolution Management EKAW-02, 2002.

[SK03] H. Stuckenschmidt, M. Klein. Integrity and Change in Modular Ontologies. IJCAI-03, 2003.

[SP10] Y. Stavrakas, G. Papastefanatos. Supporting Complex Changes in Evolving Interrelated Web Databanks. CoopIS-10, 2010.

[SSN+10] H. Van de Sompel, R. Sanderson, M.L. Nelson, L.L. Balakireva, H. Shankar, S. Ainsworth. An HTTP-Based Versioning Mechanism for Linked Data. LDOW-10, 2010.

[TSBM10] J. Tao, E. Sirin, J. Bao, D.L. McGuinness. Integrity Constraints in OWL. AAAI-10, 2010.

[TTA08] Y. Tzitzikas, Y. Theoharis, D. Andreou. On Storage Policies for the Semantic Web Repositories that Support Version. ESWC-08, 2008.

[TLZ12] Y. Tzitzikas, C. Lantzaki, D. Zeginis. Blank Node Matching and RDF/S Comparison Functions. ISWC-12, 2012.

Page 170: Giorgos FlourisOpen Data Tutorials, May 2013 1 Data and Knowledge Evolution Slides available at: fgeo/Publications/WOD13.ppt Giorgos.

Giorgos Flouris Open Data Tutorials, May 2013 170

References (18/18)[VWS+05] M. Volkel, W. Winkler, Y. Sure, S. Kruk, M. Synak. SemVersion: A Versioning

system for RDF and Ontologies. ESWC-05, 2005.

[WHR+05] H. Wang, M. Horridge, A. Rector, N. Drummond, J. Seidenberg. Debugging OWL-DL Ontologies: A Heuristic Approach. ISWC-05, 2005.

[WWT10] Z. Wang, K. Wang, R. Topor. A New Approach to Knowledge Base Revision in DL-Lite. AAAI-10, 2010.

[ZAA+13] F. Zablith, G. Antoniou, M. d’Aquin, G. Flouris, H. Kondylakis, E. Motta, D. Plexousakis, M. Sabou. Ontology Evolution: A Process Centric Survey. Knowledge Engineering Review (to appear).

[ZTC11] D. Zeginis, Y. Tzitzikas, V. Christophides. On Computing Deltas of RDF/S Knowledge Bases. ACM Transactions on the Web (TWEB) 5(3), 2011.

[ZZL+03] Z. Zhang, L. Zhang, C.X. Lin, Y. Zhao, Y. Yu. Data Migration for Ontology Evolution. Poster ISWC-03, 2003.


Recommended