Post on 31-May-2015
description
transcript
TripleCheckMate: A Tool for Crowdsourcing the Quality Assessment of Linked Data
Dimitris Kontokostas, Amrapali Zaveri, Sören Auer and Jens Lehmann
KESW 2013 Oct 08, 2013
Outline
❏Data Quality❏Data Quality Assessment Methodology❏ Evaluation Methodology - Manual
❏ Phase I: Quality Problem Taxonomy❏ Phase II: Crowdsourcing Quality Assessment
❏ TripleCheckMate❏ Architecture❏Demo
❏Conclusion & Future Work
2
Data Quality
● Data Quality (DQ) is defined as:○ fitness for a certain use case*
● On the Data Web - varying quality of information covering various domains
● High quality datasets ○ curated over decades - life science domain○ crowdsourcing process - extracted from unstructured
and semi-structured information, e.g. DBpedia
* J. Juran. The Quality Control Handbook. McGraw-Hill, New York, 1974.3
Data Quality Assessment Methodology
4 Step Methodology:
❏ Step 1: Resource selection❏ Per Class❏ Completely random❏ Manual
❏ Step 2: Evaluation mode selection❏ Manual❏ Semi-automatic❏ Automatic
❏ Step 3: Resource evaluation
❏ Step 4: DQ improvement❏ Direct❏ Indirect
4
Evaluating Methodology - Manual
❏Phase I: Creation of quality problem taxonomy
❏Phase II: Crowdsourcing quality assessment
5
Phase I: Quality Problem Taxonomy
AZaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer. Quality assessment methodologies for Linked Open Data: A Review. Under review, available at http://www.semantic-webjournal.net/content/quality-assessment-methodologieslinked-open-data.
6
Phase II: Crowdsourcing Quality Assessment
Crowdsourcing Our Approach
Type Human Intelligent Tasks (HITs)
Contest-based
Participants Labor market Linked Data (LD) experts
Task Detect quality issues in triples
Detect & classify quality issues in resources
Reward Per tasks/triple Most no. of resources evaluated
Tool Amazon Mechanical Turk, CrowdFlower etc.
TripleCheckMate
7
TripleCheckMate - Architecture (1/2)
8
TripleCheckMate - Architecture (2/2)
● Built on Java / GWT○ GWT compiles to native cross-browser HTML/JS
● Tomcat / Jetty & MySQL as minimal backend○ store/retrieve evaluation data only
● Application logic is built on the client○ SPARQL executed on client○ Portable
9
Evaluation storage schema
● Designed to support multiple campaigns and different ontologies
● Quality taxonomy is stored in the database which makes it easy to adapt
10
TripleCheckMate - Demo
http://tinyurl.com/TCM-Demohttp://tinyurl.com/TCM-Screencast
Conclusion & Future Work
● TripleCheckMate○ Tool for crowdsouring quality assessment○ Linked Data quality assessment○ Supports inter-rater agreement○ Can be used with any Linked Dataset
● Future Work○ Directly integrating semi-automatic methods○ Improve efficiency of quality assessment○ Include support for Patch Ontology* as output format
* M. Knuth, J. Hercher, and H. Sack. Collaboratively patching linked data. CoRR, 2012. 12
Thank YouQuestions?
http://nl.dbpedia.org:8080/TripleCheckMate-Demo/https://github.com/AKSW/TripleCheckMate
http://aksw.org/AmrapaliZaverizaveri@informatik.uni-leipzig.de
Twitter: @amrapaliz