A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

Post on 14-Apr-2017

447 views 0 download

transcript

A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases:

The 3cixty Use Case31st of May, 2016

1st International Workshop on Completing and Debugging the Semantic Webat the 13th Extended Semantic Web Conference

Nandana Mihindukulasooriya1, Giuseppe Rizzo2 , Raphaël Troncy3 , Oscar Corcho1, and Raúl Garcıa-Castro1

1Ontology Engineering Group, UPM, Spain.2ISMB, Italy.

3EURECOM, France.

Acknowledgments: FPI grant (BES-2014-068449), Innovation activity 3cixty (14523) of EIT Digital,

and 4V (TIN2013-46238-C4-2-R), Juan Carlos Ballesteros (Localidata)

2

Outline

Ontology Engineering Group, Universidad Politécnica de Madrid

• 3cixty use case• Motivation • Techniques and tools• Results

3

3cixty knowledge base

Ontology Engineering Group, Universidad Politécnica de Madrid

A semantic web platform that enables to build real-world and comprehensive knowledge bases in the domain of culture and tourism

for cities using the public the information about places and events.

4

The 3cixty architecture

Ontology Engineering Group, Universidad Politécnica de Madrid

5

Motivation

Ontology Engineering Group, Universidad Politécnica de Madrid

:• Data with 4Vs

• Volume, Variety, Velocity, Veracity • Evolving schema • Plenty of tools involved in the process• Multiple geographically dispersed teams• Dependent applications

Many chances for potential errors

The need for a good quality assurance approach

6Ontology Engineering Group, Universidad Politécnica de Madrid

Can we adapt some lessons learnt from

Software Engineering for knowledge base

generation?

7

Continuous Integration is essential

Ontology Engineering Group, Universidad Politécnica de Madrid

8

Cost of defects Vs. Time

Ontology Engineering Group, Universidad Politécnica de Madrid

Time

Cost

9

Agile testing quadrants

Ontology Engineering Group, Universidad Politécnica de Madrid

check for expected outputs

analyze undefined,unknown,& unexpected

10

A Two-Fold Quality Assurance Approach

• Two techniques• Scripted fine-grained analysis

• checking for expected results • Exploratory testing

• analyzing the unexpected results

• Two techniques are complementary• Exploratory testing can provide heuristics for fine-grained

analysis

• Supported by two tools • SPARQL Interceptor • Loupe

Ontology Engineering Group, Universidad Politécnica de Madrid

11

Exploratory Testing

Ontology Engineering Group, Universidad Politécnica de Madrid

simultaneous learning, test design and test execution

minimal planning and maximum text execution

Nandana Mihindukulasooriya

12

Loupe – Linked Data Inspector

• Web application for exploring and inspecting datasets• Class explorer• Property explorer• Triple pattern explorer• Named graph explorer

• Starts from high-levels statistics and allows to “zoom in” several levels of details

• Analysis of different datatypes• most common and least common values• numeric - min, max, mode, std. dev• string – string length, uri like strings

• Avoid the need for boiler-plate SPARQL queries • Ability to view the relevant data directly

Ontology Engineering Group, Universidad Politécnica de Madrid

13

Loupe Architecture

Ontology Engineering Group, Universidad Politécnica de Madrid

http://loupe.linkeddata.es/

14

Loupe UI

Ontology Engineering Group, Universidad Politécnica de Madrid

15

Fine-grained analysis

Ontology Engineering Group, Universidad Politécnica de Madrid

• a set of user-defined SPARQL queries (as unit tests)• Knowledge-based specific

TestSPARQLQueries

SystemRequiremen

ts

Schema Constraints

Conventions and otherrestrictionsInputs from

Exploratory Testing

16

SPARQL Interceptor

• seamless integration with Jenkins continuous integration system

• executes automatically for each build• provides

• summary reports• configurable email notifications

• for each failed test• the reason for the failure• a description of the query• a link to failed data using an SPARQL endpoint

Ontology Engineering Group, Universidad Politécnica de Madrid

17

SPARQL Interceptor

Ontology Engineering Group, Universidad Politécnica de Madrid

Designed and implemented by Localidata.

18

Defects found in exploratory testing

Ontology Engineering Group, Universidad Politécnica de Madrid

• Inconsistencies in using vocabularies• locn:hasAddress Vs schema:streetAddress• http://xmlns.com/foaf/0.1/ and http://xmlns.com/foaf/spec/

• URIs as strings• ¨http://.....¨

• Outliers• Typos

• class names with small letters• Inconsistencies with the schema

• domain, range• Value patterns

• codes with 5 letters, URIs with given prefix • Date time format inconsistencies

• Violation of modeling decisions • no blank nodes for certain types

19

Defects found in fine-grained analysis

Ontology Engineering Group, Universidad Politécnica de Madrid

• property cardinalities related issues• missing of properties

• Each dul:Place or lode:Event must have a title• presence of duplicated properties

• dul:Place or lode:Event must have exactly one geo location

• missing language labels• one label per each language

• Out of bound values for a fixed upper and lower limits • Neighboring cells in a grid (3 to 8)

• Datatype syntax errors• numeric types• Datetime types

20

Defects found in fine-grained analysis

Ontology Engineering Group, Universidad Politécnica de Madrid

• Constraints on value ranges• geo:lat and geo:long must be in a within the city’s bounding

box area• triples not associated with producer graphs

• each triple belongs to a producer graph• presence of unsolicited instances

• home locations are removed from the knowledge base

21

Conclusions and future work

Ontology Engineering Group, Universidad Politécnica de Madrid

• Dynamic knowledge bases require good quality assurance approaches

• Knowledge-base publishers can learn from / adapt practices from software engineering

• Supporting tools improve quality assurance

• In the future,• Integration with outlier detection algorithms • Generation of constraints in Loupe • Integration of SPARQL Interceptor with W3C SHACL