+ All Categories
Home > Technology > A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

Date post: 14-Apr-2017
Category:
Upload: nandana-mihindukulasooriya
View: 447 times
Download: 0 times
Share this document with a friend
21
A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases: The 3cixty Use Case 31st of May, 2016 1st International Workshop on Completing and Debugging the Semantic Web at the 13th Extended Semantic Web Conference Nandana Mihindukulasooriya 1 , Giuseppe Rizzo 2 , Raphaël Troncy 3 , Oscar Corcho 1 , and Raúl Garcı́a-Castro 1 1 Ontology Engineering Group, UPM, Spain. 2 ISMB, Italy. 3 EURECOM, France. Acknowledgments: FPI grant (BES-2014-068449), Innovation activity 3cixty (14523) of EIT Digital, and 4V (TIN2013-46238-C4-2-R), Juan Carlos Ballesteros (Localidata)
Transcript
Page 1: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases:

The 3cixty Use Case31st of May, 2016

1st International Workshop on Completing and Debugging the Semantic Webat the 13th Extended Semantic Web Conference

Nandana Mihindukulasooriya1, Giuseppe Rizzo2 , Raphaël Troncy3 , Oscar Corcho1, and Raúl Garcıa-Castro1

1Ontology Engineering Group, UPM, Spain.2ISMB, Italy.

3EURECOM, France.

Acknowledgments: FPI grant (BES-2014-068449), Innovation activity 3cixty (14523) of EIT Digital,

and 4V (TIN2013-46238-C4-2-R), Juan Carlos Ballesteros (Localidata)

Page 2: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

2

Outline

Ontology Engineering Group, Universidad Politécnica de Madrid

• 3cixty use case• Motivation • Techniques and tools• Results

Page 3: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

3

3cixty knowledge base

Ontology Engineering Group, Universidad Politécnica de Madrid

A semantic web platform that enables to build real-world and comprehensive knowledge bases in the domain of culture and tourism

for cities using the public the information about places and events.

Page 4: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

4

The 3cixty architecture

Ontology Engineering Group, Universidad Politécnica de Madrid

Page 5: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

5

Motivation

Ontology Engineering Group, Universidad Politécnica de Madrid

:• Data with 4Vs

• Volume, Variety, Velocity, Veracity • Evolving schema • Plenty of tools involved in the process• Multiple geographically dispersed teams• Dependent applications

Many chances for potential errors

The need for a good quality assurance approach

Page 6: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

6Ontology Engineering Group, Universidad Politécnica de Madrid

Can we adapt some lessons learnt from

Software Engineering for knowledge base

generation?

Page 7: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

7

Continuous Integration is essential

Ontology Engineering Group, Universidad Politécnica de Madrid

Page 8: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

8

Cost of defects Vs. Time

Ontology Engineering Group, Universidad Politécnica de Madrid

Time

Cost

Page 9: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

9

Agile testing quadrants

Ontology Engineering Group, Universidad Politécnica de Madrid

check for expected outputs

analyze undefined,unknown,& unexpected

Page 10: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

10

A Two-Fold Quality Assurance Approach

• Two techniques• Scripted fine-grained analysis

• checking for expected results • Exploratory testing

• analyzing the unexpected results

• Two techniques are complementary• Exploratory testing can provide heuristics for fine-grained

analysis

• Supported by two tools • SPARQL Interceptor • Loupe

Ontology Engineering Group, Universidad Politécnica de Madrid

Page 11: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

11

Exploratory Testing

Ontology Engineering Group, Universidad Politécnica de Madrid

simultaneous learning, test design and test execution

minimal planning and maximum text execution

Nandana Mihindukulasooriya
Page 12: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

12

Loupe – Linked Data Inspector

• Web application for exploring and inspecting datasets• Class explorer• Property explorer• Triple pattern explorer• Named graph explorer

• Starts from high-levels statistics and allows to “zoom in” several levels of details

• Analysis of different datatypes• most common and least common values• numeric - min, max, mode, std. dev• string – string length, uri like strings

• Avoid the need for boiler-plate SPARQL queries • Ability to view the relevant data directly

Ontology Engineering Group, Universidad Politécnica de Madrid

Page 13: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

13

Loupe Architecture

Ontology Engineering Group, Universidad Politécnica de Madrid

http://loupe.linkeddata.es/

Page 14: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

14

Loupe UI

Ontology Engineering Group, Universidad Politécnica de Madrid

Page 15: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

15

Fine-grained analysis

Ontology Engineering Group, Universidad Politécnica de Madrid

• a set of user-defined SPARQL queries (as unit tests)• Knowledge-based specific

TestSPARQLQueries

SystemRequiremen

ts

Schema Constraints

Conventions and otherrestrictionsInputs from

Exploratory Testing

Page 16: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

16

SPARQL Interceptor

• seamless integration with Jenkins continuous integration system

• executes automatically for each build• provides

• summary reports• configurable email notifications

• for each failed test• the reason for the failure• a description of the query• a link to failed data using an SPARQL endpoint

Ontology Engineering Group, Universidad Politécnica de Madrid

Page 17: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

17

SPARQL Interceptor

Ontology Engineering Group, Universidad Politécnica de Madrid

Designed and implemented by Localidata.

Page 18: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

18

Defects found in exploratory testing

Ontology Engineering Group, Universidad Politécnica de Madrid

• Inconsistencies in using vocabularies• locn:hasAddress Vs schema:streetAddress• http://xmlns.com/foaf/0.1/ and http://xmlns.com/foaf/spec/

• URIs as strings• ¨http://.....¨

• Outliers• Typos

• class names with small letters• Inconsistencies with the schema

• domain, range• Value patterns

• codes with 5 letters, URIs with given prefix • Date time format inconsistencies

• Violation of modeling decisions • no blank nodes for certain types

Page 19: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

19

Defects found in fine-grained analysis

Ontology Engineering Group, Universidad Politécnica de Madrid

• property cardinalities related issues• missing of properties

• Each dul:Place or lode:Event must have a title• presence of duplicated properties

• dul:Place or lode:Event must have exactly one geo location

• missing language labels• one label per each language

• Out of bound values for a fixed upper and lower limits • Neighboring cells in a grid (3 to 8)

• Datatype syntax errors• numeric types• Datetime types

Page 20: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

20

Defects found in fine-grained analysis

Ontology Engineering Group, Universidad Politécnica de Madrid

• Constraints on value ranges• geo:lat and geo:long must be in a within the city’s bounding

box area• triples not associated with producer graphs

• each triple belongs to a producer graph• presence of unsolicited instances

• home locations are removed from the knowledge base

Page 21: A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases : The 3cixty Use Case

21

Conclusions and future work

Ontology Engineering Group, Universidad Politécnica de Madrid

• Dynamic knowledge bases require good quality assurance approaches

• Knowledge-base publishers can learn from / adapt practices from software engineering

• Supporting tools improve quality assurance

• In the future,• Integration with outlier detection algorithms • Generation of constraints in Loupe • Integration of SPARQL Interceptor with W3C SHACL


Recommended