On the Use of Linked Open Data for Trusting Web Data · 2019-06-14 · Trusting Web data using...

transcript

On the Use of Linked Open Data for Trusting

Web Data

Davide Ceolin and Valentina MaccatrozzoVU University Amsterdam

Outline

• Premises

• Introduction

• A Natural History Case Study

• A Cultural Heritage Case Study

• Future directions

• Recap, bibliography, etc.

Premises

• Trust ≈ Reliability.

• We make no assumption about the intentions of the data creator.

• This presentation gives a reflection on past work (see refs. ) and outlines future directions.

Introduction

• Trust Management: subjective logic (Jøsang, 2001)

• Extends boolean and probabilistic logic.

• Reasoning on “opinions” about propositions based on evidence.

• Accounts for source and uncertainty (inversely proportional to size of evidence set).

Subjective logic: basics

ωproposition = (b, d, u)source

• b + d + u = 1

• b ≈ p(proposition)

• u inversely proportional to evidence set.

• operators: boolean, discounting, fusion...

Trusting Web data using subjective logic

• We adopt supervised learning algorithms (when a trainingset is available).

• Trust is subjective.

• We look for different “opinions” about the data. Subjective logic allows us to handle them.

• First we estimate the data trustworthiness, then we select the “best” data (based, e.g. on author reputation).

Using LOD to assist evidential reasoning

• LOD provide lots of useful data.

• More evidence.

• Subjective logic’s distributions ≈ (At least some) LOD datasets distributions (Ceolin et al., 2011).

Photo: flickr.com/clumsyjim

Museums...

Photo: flickr.com/clumsyjim

Museums...

...have a problem.

Photo: flickr.com/grrrl

Photo: flickr.com/anirudhkoul

So they recruit some help...

Trusting Museum Annotations

• Museums manage large collections.

• Several Museums crowdsource annotations.

• The quality and accuracy of annotations is crucial for their business.

• Can they trust crowdsourced annotations?

A Natural History Case Study

specimen1 user1 aves xx ✓

specimen 2 user2 aves xyz ✗

specimen 3 user1 aves xz1 ✓

specimen 4 user2 aves yy ✓

specimen 5 user3 aves zz ✗

tax author1

tax author2

tax author1

tax author2

tax author1

tax author2

tax author1

tax author2

Increased accuracy,from 53% to 82%

on a museum datasetCeolin et al., 2010

Another Case StudySemantic similarity for weighing evidence.

ExpertiseTraining set

New annotationTulip

Flower

Purple

Semantic similarity

Up to: 84% accuracy, 88% precision, 96% Recall on two museum datasets (Ceolin et al. 2013a)

Future work

• We used similar methods (plus other statistical techniques) for analzying the reliability of UK Police Open Data (Ceolin et al., 2013b).

• We plan to extend them with LOD, e.g. for:

• geodisambiguation;

• crime type hierarchies.

Recap• LOD + Evidential reasoning (subjective logic) is a

powerful combination for trust (reliability) estimation

• enrichment;

• weighing.

• The more the better, but:

• evidence quality counts;

• data needs to be tracked (W3C PROV) and properly managed.

Bibliography• Jøsang, A., A logic for uncertain probabilities. International Journal of Uncertainty,

Fuzziness and Knowledge-Based Systems, 9(3), pp. 279-311, 2001

• Ceolin, D. van Hage, W. R. Fokkink, W. A Trust Model to Estimate Quality of Annotations using the Web. In WebSci, Web Science Repository, 2010.

• Ceolin, D., van Hage, W.R., Fokkink, W., Schreiber, G. Estimating Uncertainty of Categorical Web Data. In URSW, CEUR-ws.org, 2011.

• Ceolin, D. Nottamkandath, A. Fokkink, W., Semi-automated Assessment of Annotations Trustworthiness. In PST Conference, IEEE, 2013

• Ceolin, D. Moreau, L. O'Hara, K. Schreiber, G. Sackley, A. Fokkink, W. van Hage, W.R. Shadbolt, N., Reliability Analyses of Open Government Data. In URSW, CEUR-ws.org, 2013

• Ceolin, D. Nottamkandath, A. Fokkink, W. Efficient Semi-automated Assessment of Annotations Trustworthiness In Journal of Trust Management, Springer. (Accepted, 2014)

Thank you!

Any question?d.ceolin@vu.nl

On the Use of Linked Open Data for Trusting Web Data · 2019-06-14 · Trusting Web data using...

Documents