Date post: | 14-Apr-2017 |
Category: |
Technology |
Upload: | nandana-mihindukulasooriya |
View: | 447 times |
Download: | 0 times |
A Two-Fold Quality Assurance Approach for Dynamic Knowledge Bases:
The 3cixty Use Case31st of May, 2016
1st International Workshop on Completing and Debugging the Semantic Webat the 13th Extended Semantic Web Conference
Nandana Mihindukulasooriya1, Giuseppe Rizzo2 , Raphaël Troncy3 , Oscar Corcho1, and Raúl Garcıa-Castro1
1Ontology Engineering Group, UPM, Spain.2ISMB, Italy.
3EURECOM, France.
Acknowledgments: FPI grant (BES-2014-068449), Innovation activity 3cixty (14523) of EIT Digital,
and 4V (TIN2013-46238-C4-2-R), Juan Carlos Ballesteros (Localidata)
2
Outline
Ontology Engineering Group, Universidad Politécnica de Madrid
• 3cixty use case• Motivation • Techniques and tools• Results
3
3cixty knowledge base
Ontology Engineering Group, Universidad Politécnica de Madrid
A semantic web platform that enables to build real-world and comprehensive knowledge bases in the domain of culture and tourism
for cities using the public the information about places and events.
4
The 3cixty architecture
Ontology Engineering Group, Universidad Politécnica de Madrid
5
Motivation
Ontology Engineering Group, Universidad Politécnica de Madrid
:• Data with 4Vs
• Volume, Variety, Velocity, Veracity • Evolving schema • Plenty of tools involved in the process• Multiple geographically dispersed teams• Dependent applications
Many chances for potential errors
The need for a good quality assurance approach
6Ontology Engineering Group, Universidad Politécnica de Madrid
Can we adapt some lessons learnt from
Software Engineering for knowledge base
generation?
7
Continuous Integration is essential
Ontology Engineering Group, Universidad Politécnica de Madrid
8
Cost of defects Vs. Time
Ontology Engineering Group, Universidad Politécnica de Madrid
Time
Cost
9
Agile testing quadrants
Ontology Engineering Group, Universidad Politécnica de Madrid
check for expected outputs
analyze undefined,unknown,& unexpected
10
A Two-Fold Quality Assurance Approach
• Two techniques• Scripted fine-grained analysis
• checking for expected results • Exploratory testing
• analyzing the unexpected results
• Two techniques are complementary• Exploratory testing can provide heuristics for fine-grained
analysis
• Supported by two tools • SPARQL Interceptor • Loupe
Ontology Engineering Group, Universidad Politécnica de Madrid
11
Exploratory Testing
Ontology Engineering Group, Universidad Politécnica de Madrid
simultaneous learning, test design and test execution
minimal planning and maximum text execution
12
Loupe – Linked Data Inspector
• Web application for exploring and inspecting datasets• Class explorer• Property explorer• Triple pattern explorer• Named graph explorer
• Starts from high-levels statistics and allows to “zoom in” several levels of details
• Analysis of different datatypes• most common and least common values• numeric - min, max, mode, std. dev• string – string length, uri like strings
• Avoid the need for boiler-plate SPARQL queries • Ability to view the relevant data directly
Ontology Engineering Group, Universidad Politécnica de Madrid
13
Loupe Architecture
Ontology Engineering Group, Universidad Politécnica de Madrid
http://loupe.linkeddata.es/
14
Loupe UI
Ontology Engineering Group, Universidad Politécnica de Madrid
15
Fine-grained analysis
Ontology Engineering Group, Universidad Politécnica de Madrid
• a set of user-defined SPARQL queries (as unit tests)• Knowledge-based specific
TestSPARQLQueries
SystemRequiremen
ts
Schema Constraints
Conventions and otherrestrictionsInputs from
Exploratory Testing
16
SPARQL Interceptor
• seamless integration with Jenkins continuous integration system
• executes automatically for each build• provides
• summary reports• configurable email notifications
• for each failed test• the reason for the failure• a description of the query• a link to failed data using an SPARQL endpoint
Ontology Engineering Group, Universidad Politécnica de Madrid
17
SPARQL Interceptor
Ontology Engineering Group, Universidad Politécnica de Madrid
Designed and implemented by Localidata.
18
Defects found in exploratory testing
Ontology Engineering Group, Universidad Politécnica de Madrid
• Inconsistencies in using vocabularies• locn:hasAddress Vs schema:streetAddress• http://xmlns.com/foaf/0.1/ and http://xmlns.com/foaf/spec/
• URIs as strings• ¨http://.....¨
• Outliers• Typos
• class names with small letters• Inconsistencies with the schema
• domain, range• Value patterns
• codes with 5 letters, URIs with given prefix • Date time format inconsistencies
• Violation of modeling decisions • no blank nodes for certain types
19
Defects found in fine-grained analysis
Ontology Engineering Group, Universidad Politécnica de Madrid
• property cardinalities related issues• missing of properties
• Each dul:Place or lode:Event must have a title• presence of duplicated properties
• dul:Place or lode:Event must have exactly one geo location
• missing language labels• one label per each language
• Out of bound values for a fixed upper and lower limits • Neighboring cells in a grid (3 to 8)
• Datatype syntax errors• numeric types• Datetime types
20
Defects found in fine-grained analysis
Ontology Engineering Group, Universidad Politécnica de Madrid
• Constraints on value ranges• geo:lat and geo:long must be in a within the city’s bounding
box area• triples not associated with producer graphs
• each triple belongs to a producer graph• presence of unsolicited instances
• home locations are removed from the knowledge base
21
Conclusions and future work
Ontology Engineering Group, Universidad Politécnica de Madrid
• Dynamic knowledge bases require good quality assurance approaches
• Knowledge-base publishers can learn from / adapt practices from software engineering
• Supporting tools improve quality assurance
• In the future,• Integration with outlier detection algorithms • Generation of constraints in Loupe • Integration of SPARQL Interceptor with W3C SHACL