+ All Categories
Home > Technology > Programmatic Access to Crowdsourced Human Computation for Designing and Enhancing Interlinking

Programmatic Access to Crowdsourced Human Computation for Designing and Enhancing Interlinking

Date post: 04-Aug-2015
Category:
Upload: cristina-sarasua
View: 39 times
Download: 2 times
Share this document with a friend
Popular Tags:
15
Institute for Web Science and Technologies · University of Koblenz-Landau, Germany Programmatic Access to Crowdsourced Human Computation for Designing and Enhancing Interlinking Cristina Sarasua [email protected] ESWC2015 Developers Workshop
Transcript

Institute for Web Science and Technologies · University of Koblenz-Landau, Germany

Programmatic Access to CrowdsourcedHuman Computation for

Designing and Enhancing Interlinking

Cristina [email protected]

ESWC2015 Developers Workshop

CROWDKI 2Cristina Sarasua

The Problem

CROWDKI 3Cristina Sarasua

Mostly identity links, few prominent interlinking hubs with high number of in-links, and still 44% of the analyzed datasets do

not contain out-links [Schmachtenberg et al., 2014]

To improve in heterogeneity and quantity of links we need:– To overcome computational limitations of automatic link

discovery methods

– Methods that assist data publishers in deciding the datasets to target and the way to define the interlinks.

Current LOD

CROWDKI 4Cristina Sarasua

CROWDKI: Crowd-Powered Knowledge Integration

CROWDKI 5Cristina Sarasua

Humans involved systematically in processing data for interlinking

Human input is collected via microtask crowdsourcing– Online marketplaces (e.g. Clickworker)

– Anyone registered (all around the world)

– Economic reward

– Divided into simple tasks (e.g. review a particular link between two resources)

– Large and dedicated workforce → fast completion time (hours / days)

Software to manage the generation and completion of microtasks related to interlinking: https://github.com/criscod/CROWDKI

Crowdsourced Human Computation for Interlinking

CROWDKI 6Cristina Sarasua

Input: list of interlinking possibilities and a particular context

Car wasDesigned by Person

Car wasDriven by Person

Car wasRecommended by Person

Output: contextual relevance assessment by the crowd

UC1:Assessing the relevance of different interlinking possibilities

CROWDKI 7Cristina Sarasua

UC1:Assessing the relevance of different interlinking possibilities

CROWDKI 8Cristina Sarasua

Input: set of candidate links, RDF data Output: set of final links (extended / post-processed)

UC2:Validating and Enhancing automatically computed links:

CROWDKI 9Cristina Sarasua

UC2:Validating and Enhancing automatically computed links:

CROWDKI 10Cristina Sarasua

Architecture

CrowdFlowerREST API

Microtasks templates & config

Java

Jena, SPARQL

JSON

Guava IO

CROWDKI 11Cristina Sarasua

Starting CROWDKI

CROWDKI 12Cristina Sarasua

Communication is key: processing data with typos may be interpreted wrongly

Communities of crowd workers are emerging Not all interlinking scenarios require human computation (e.g.

country ISO codes) – the challenge is to automatically decide when it is really worthwhile

Drawbacks: no real-time crowdsourcing and crowd workers cannot be selected accurately

CrowdFlower (the crowdsourcing platform used) provides more access to more feature via the UI than the API

Lessons Learned

CROWDKI 13Cristina Sarasua

Conclusions

Hybrid approaches (automatic + crowd interlinking) can be better (P,R) than purely automatic interlinking methods

CROWDKI could be used in combination with dataset recommendation methods that analyze the way data is already interlinked.

CROWDKI 14Cristina Sarasua

Which challenges did you face with state-of-the-art link discovery tools, what went wrong?

How do you think human computation can further help in interlinking?

What are in your opinion pros and cons of this approach?

Questions for the audience

CROWDKI 15Cristina Sarasua

Max Schmachtenberg, Christian Bizer, Heiko Paulheim: Adoption of the Linked Data Best Practices in Different Topical Domains. 13th International Semantic Web Conference (ISWC2014) - RDB Track, Riva del Garda, Italy, October 2014

References


Recommended