+ All Categories
Page 1: Microtask Crowdsourcing Applications for Linked Data

Microtask Crowdsourcing

Applications for Linked Data

Page 2: Microtask Crowdsourcing Applications for Linked Data


Architecture of Linked Data Applications

SPARQL EndpointsWeb Data accessed via APIs

Data Tier


Integrated Dataset

Interlinking CleansingData Access Component

Linked DataEUCLID – Microtask crowdsourcing

applications for Linked Data

Relational Data

Vocabulary Mapping

Logic Tier

Presentation Tier

Data Integration Component

Republication Republication Component

SPARQL Wr. R2R Transf. LD WrapperPhysical Wrapper

Page 3: Microtask Crowdsourcing Applications for Linked Data


CH 2

Data Integration Component

• Consolidates the data retrieved from heterogeneous sources.

• This component may operate at:– Schema level: Performs vocabulary mappings in order to translate

data into a single unified schema. Links correspond to RDFS properties or OWL property and class axioms.

– Instance level: Performs entity linking, e.g., entity resolution via owl:sameAs links CH 3

Data Tier

Interlinking CleansingData Access Component

Vocabulary Mapping

Data Integration Component

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 4: Microtask Crowdsourcing Applications for Linked Data


Data Integration Component

The data integration component can be enhanced by including microtask crowdsourcing apporaches:

• Cleansing or data assessments: Assessment of DBpedia triples

• Vocabulary mapping: CrowdMAP

• Interlinking: ZenCrowd

Data Tier (2)

Interlinking CleansingData Access Component

Vocabulary Mapping

Data Integration Component

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 5: Microtask Crowdsourcing Applications for Linked Data


Other Crowdsourcing-based Solutions for Linked Data Tasks

• Query understanding: CrowdDQ

• Ontology population: OntoGame

• Linked Data curation: Urbanopoly

• …

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 6: Microtask Crowdsourcing Applications for Linked Data


EUCLID – Microtask crowdsourcing applications for Linked Data

Page 7: Microtask Crowdsourcing Applications for Linked Data

Assessing DBpedia Triples

1. Selecting LD quality issues generated by erroneous extraction mechanisms and that can be detected by the crowd

2. Selecting the appropriate crowdsourcing approaches

3. Designing and generating the interfaces to present the data to the crowd

Dataset{s p o .}

{s p o .}


Incorrect +Quality issue

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 8: Microtask Crowdsourcing Applications for Linked Data

Three categories of quality problems occur pervasively in DBpedia [Zaveri2013]

and can be crowdsourced:

• Incorrect object Example: dbpedia:Dave_Dobbyn dbprop:dateOfBirth “3”.

• Incorrect data type Example: dbpedia:Torishima_Izu_Islands foaf:name “鳥島” @en.

• Incorrect link to “external Web pages” Example: dbpedia:John-Two-Hawks dbpedia-owl:wikiPageExternalLink


Selecting LD Quality Issues to Crowdsource

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 9: Microtask Crowdsourcing Applications for Linked Data

Selecting Appropriate Crowdsourcing Approaches

ContestLD ExpertsDifficult taskFinal prize

Find Verify

MicrotasksWorkersEasy taskMicropayments

TripleCheckMate [Kontoskostas2013] MTurk

Adapted from [Bernstein2010]

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 10: Microtask Crowdsourcing Applications for Linked Data

Presenting the Data to the Crowd

• Selection of foaf:name or rdfs:label to extract human-readable descriptions

• Real object values extracted automatically from Wikipedia infoboxes

• Link to the Wikipedia article via foaf:isPrimaryTopicOf

• Preview of external pages by implementing HTML iframe

Microtask interfaces: MTurk tasksIncorrect object

Incorrect data type

Incorrect outlink

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 11: Microtask Crowdsourcing Applications for Linked Data



Object values Data types Interlinks

Linked Data experts

0.7151 0.8270 0.1525

MTurk (majority voting)

0.8977 0.4752 0.9412

• Both forms of crowdsourcing can be applied to detect certain LD quality issues

• The effort of LD experts must be applied on those tasks demanding specific-domain skills

• MTurk crowd are exceptionally good at performing comparison of data entries

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 12: Microtask Crowdsourcing Applications for Linked Data


EUCLID – Microtask crowdsourcing applications for Linked Data

Page 13: Microtask Crowdsourcing Applications for Linked Data


ZenCrowd: Entity Linking by the Crowd

• Combine both algorithmic and manual linking• Automate manual linking via crowdsourcing• Dynamically assess human workers with a

probabilistic reasoning framework



EUCLID – Microtask crowdsourcing applications for Linked Data

Page 14: Microtask Crowdsourcing Applications for Linked Data







<p>Facebook is not waiting for its initial public offering to make its first big purchase.</p><p>In its largest acquisition to date, the social network has purchased Instagram, the popular photo-sharing application, for about $1 billion in cash and stock, the company said Monday.</p>

<p><span about="http://dbpedia.org/resource/Facebook"><cite property=”rdfs:label">Facebook</cite> is not waiting for its initial public offering to make its first big purchase.</span></p><p><span about="http://dbpedia.org/resource/Instagram">In its largest acquisition to date, the social network has purchased <cite property=”rdfs:label">Instagram</cite> , the popular photo-sharing application, for about $1 billion in cash and stock, the company said Monday.</span></p>

RDFa enrichment


EUCLID – Microtask crowdsourcing applications for Linked Data

Page 15: Microtask Crowdsourcing Applications for Linked Data


ZenCrowd Architecture

Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudré-Mauroux. ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking. In: 21st International Conference on World Wide Web (WWW 2012).

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 16: Microtask Crowdsourcing Applications for Linked Data


Entity Factor Graphs

• Graph components– Workers, links, clicks– Prior probabilities– Link Factors– Constraints

• Probabilistic Inference– Select all links with

posterior prob >τ 2 workers, 6 clicks, 3 candidate links

Link priors






EUCLID – Microtask crowdsourcing applications for Linked Data

Page 17: Microtask Crowdsourcing Applications for Linked Data


Lessons Learnt• Crowdsourcing + Prob reasoning works!• But

– Different worker communities perform differently– Many low quality workers– Completion time may vary (based on reward)

• Need to find the right workers for your task (see WWW13 paper)

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 18: Microtask Crowdsourcing Applications for Linked Data


ZenCrowd Summary

• ZenCrowd: Probabilistic reasoning over automatic and crowdsourcing methods for entity linking

• Standard crowdsourcing improves 6% over automatic• 4% - 35% improvement over standard crowdsourcing• 14% average improvement over automatic approaches

• Follow up-work (VLDBJ):– Also used for instance matching across datasets– 3-way blocking with the crowd


EUCLID – Microtask crowdsourcing applications for Linked Data

Page 19: Microtask Crowdsourcing Applications for Linked Data


EUCLID – Microtask crowdsourcing applications for Linked Data

Page 20: Microtask Crowdsourcing Applications for Linked Data



• Web Search Engines can answer simple factual queries directly on the result page

• Users with complex information needs are often unsatisfied

• Purely automatic techniques are not enough

• We want to solve it with Crowdsourcing!

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 21: Microtask Crowdsourcing Applications for Linked Data


CrowdQ• CrowdQ is the first system that uses

crowdsourcing to– Understand the intended meaning– Build a structured query template– Answer the query over Linked Open Data

Gianluca Demartini, Beth Trushkowsky, Tim Kraska, and Michael Franklin. CrowdQ: Crowdsourced Query Understanding. In: 6th Biennial Conference on Innovative Data Systems Research (CIDR 2013).

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 22: Microtask Crowdsourcing Applications for Linked Data


Page 23: Microtask Crowdsourcing Applications for Linked Data


CrowdQ ArchitectureOff-line: query template generation with the help of the crowdOn-line: query template matching using NLP and search over open data

Page 24: Microtask Crowdsourcing Applications for Linked Data


Hybrid Human-Machine Pipeline

Q= birthdate of actors of forrest gump

Query annotation Noun Noun Named entity


Entity Relations

Is forrest gump this entity in the query?

Which is the relation between: actors and forrest gump starring

Schema element Starring <dbpedia-owl:starring>

Verification Is the relation between:Indiana Jones – Harrison FordBack to the Future – Michael J. Foxof the same type asForrest Gump – actors

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 25: Microtask Crowdsourcing Applications for Linked Data


Structured query generation

SELECT ?y ?xWHERE { ?y <dbpedia-owl:birthdate> ?x .

?z <dbpedia-owl:starring> ?y .?z <rdfs:label> ‘Forrest Gump’ }

Results from BTC09:

Q= birthdate of actors of forrest gumpMOVIE


EUCLID – Microtask crowdsourcing applications for Linked Data

Page 26: Microtask Crowdsourcing Applications for Linked Data


EUCLID – Microtask crowdsourcing applications for Linked Data

Page 27: Microtask Crowdsourcing Applications for Linked Data



• Experiments using MTurk, CrowdFlower and established benchmarks• Enhancing the results of automatic techniques• Fast, accurate, cost-effective [Sarasua, Simperl, Noy,








PRECISION 0.53 0.8 1.0 1.0 0.93 0.73

RECALL 1.0 0.42 0.7 0.75 0.65 1.0

Page 28: Microtask Crowdsourcing Applications for Linked Data

10.04.2023 28

Taste IT! Try IT!

• Restaurant review Android app developed in the Insemtives project• Uses Dbpedia concepts to generate structured reviews• Uses mechanism design/gamification to configure incentives• User study

– 2274 reviews by 180 reviewers referring to 900 restaurants, using 5667 DPpedia concepts








Numer of reviewsNumber of semantic annotations (type of cuisine)Number of semantic annotations (dishes)

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 29: Microtask Crowdsourcing Applications for Linked Data

10.04.2023 29


http://research.zemanta.com/crowds-to-the-rescue/EUCLID – Microtask crowdsourcing

applications for Linked Data

Page 30: Microtask Crowdsourcing Applications for Linked Data

10.04.2023 30

Ontology Population

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 31: Microtask Crowdsourcing Applications for Linked Data


Linked Data Curation

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 32: Microtask Crowdsourcing Applications for Linked Data

10.04.2023 32

Problems and Challenges• What is feasible and how can tasks be optimally translated into microtasks?

– Examples: data quality assessment for technical and contextual features; subjective vs objective tasks (also in modeling); open-ended questions

• What to show to users– Natural language descriptions of Linked Data/SPARQL– How much context– What form of rendering– How about links?

• How to combine with automatic tools– Which results to validate

• Low precision (no fun for gamers...)• Low recall (vs all possible questions)

• How to embed it into an existing application– Tasks are fine granular, perceived as additional burden to the actual functionality

• What to do with the resulting data?– Integration into existing practices– Vocabularies!

EUCLID – Microtask crowdsourcing applications for Linked Data

Page 33: Microtask Crowdsourcing Applications for Linked Data


Web site: https://sites.google.com/site/microtasktutorial/

SLIDES and EXERCISES: https://github.com/maribelacosta/crowdsourcing-tutorial

Full-day tutorial ISWC2013Sydney Australia

33EUCLID – Microtask crowdsourcing applications for Linked Data

Page 34: Microtask Crowdsourcing Applications for Linked Data


For exercises, quiz and further material visit our website:

@euclid_project euclidproject euclidproject


Other channels:

eBook Course

EUCLID – Microtask crowdsourcing applications for Linked Data

Top Related