On the Use of Linked Open Data for Trusting
Web Data
Davide Ceolin and Valentina MaccatrozzoVU University Amsterdam
Outline
• Premises
• Introduction
• A Natural History Case Study
• A Cultural Heritage Case Study
• Future directions
• Recap, bibliography, etc.
Premises
• Trust ≈ Reliability.
• We make no assumption about the intentions of the data creator.
• This presentation gives a reflection on past work (see refs. ) and outlines future directions.
Introduction
• Trust Management: subjective logic (Jøsang, 2001)
• Extends boolean and probabilistic logic.
• Reasoning on “opinions” about propositions based on evidence.
• Accounts for source and uncertainty (inversely proportional to size of evidence set).
Subjective logic: basics
ωproposition = (b, d, u)source
• b + d + u = 1
• b ≈ p(proposition)
• u inversely proportional to evidence set.
• operators: boolean, discounting, fusion...
Trusting Web data using subjective logic
• We adopt supervised learning algorithms (when a trainingset is available).
• Trust is subjective.
• We look for different “opinions” about the data. Subjective logic allows us to handle them.
• First we estimate the data trustworthiness, then we select the “best” data (based, e.g. on author reputation).
Using LOD to assist evidential reasoning
• LOD provide lots of useful data.
• More evidence.
• Subjective logic’s distributions ≈ (At least some) LOD datasets distributions (Ceolin et al., 2011).
Photo: flickr.com/clumsyjim
Museums...
Photo: flickr.com/clumsyjim
Museums...
...have a problem.
Photo: flickr.com/grrrl
Photo: flickr.com/anirudhkoul
So they recruit some help...
Trusting Museum Annotations
• Museums manage large collections.
• Several Museums crowdsource annotations.
• The quality and accuracy of annotations is crucial for their business.
• Can they trust crowdsourced annotations?
A Natural History Case Study
specimen1 user1 aves xx ✓
specimen 2 user2 aves xyz ✗
specimen 3 user1 aves xz1 ✓
specimen 4 user2 aves yy ✓
specimen 5 user3 aves zz ✗
A Natural History Case Study
specimen1 user1 aves xx ✓
specimen 2 user2 aves xyz ✗
specimen 3 user1 aves xz1 ✓
specimen 4 user2 aves yy ✓
specimen 5 user3 aves zz ✗
A Natural History Case Study
specimen1 user1 aves xx ✓
specimen 2 user2 aves xyz ✗
specimen 3 user1 aves xz1 ✓
specimen 4 user2 aves yy ✓
specimen 5 user3 aves zz ✗
tax author1
tax author2
tax author1
tax author1
tax author2
A Natural History Case Study
specimen1 user1 aves xx ✓
specimen 2 user2 aves xyz ✗
specimen 3 user1 aves xz1 ✓
specimen 4 user2 aves yy ✓
specimen 5 user3 aves zz ✗
tax author1
tax author2
tax author1
tax author1
tax author2
Increased accuracy,from 53% to 82%
on a museum datasetCeolin et al., 2010
Another Case StudySemantic similarity for weighing evidence.
ExpertiseTraining set
New annotationTulip
Flower
Red
Pink
Purple
Semantic similarity
Rose
Up to: 84% accuracy, 88% precision, 96% Recall on two museum datasets (Ceolin et al. 2013a)
Future work
• We used similar methods (plus other statistical techniques) for analzying the reliability of UK Police Open Data (Ceolin et al., 2013b).
• We plan to extend them with LOD, e.g. for:
• geodisambiguation;
• crime type hierarchies.
Recap• LOD + Evidential reasoning (subjective logic) is a
powerful combination for trust (reliability) estimation
• enrichment;
• weighing.
• The more the better, but:
• evidence quality counts;
• data needs to be tracked (W3C PROV) and properly managed.
Bibliography• Jøsang, A., A logic for uncertain probabilities. International Journal of Uncertainty,
Fuzziness and Knowledge-Based Systems, 9(3), pp. 279-311, 2001
• Ceolin, D. van Hage, W. R. Fokkink, W. A Trust Model to Estimate Quality of Annotations using the Web. In WebSci, Web Science Repository, 2010.
• Ceolin, D., van Hage, W.R., Fokkink, W., Schreiber, G. Estimating Uncertainty of Categorical Web Data. In URSW, CEUR-ws.org, 2011.
• Ceolin, D. Nottamkandath, A. Fokkink, W., Semi-automated Assessment of Annotations Trustworthiness. In PST Conference, IEEE, 2013
• Ceolin, D. Moreau, L. O'Hara, K. Schreiber, G. Sackley, A. Fokkink, W. van Hage, W.R. Shadbolt, N., Reliability Analyses of Open Government Data. In URSW, CEUR-ws.org, 2013
• Ceolin, D. Nottamkandath, A. Fokkink, W. Efficient Semi-automated Assessment of Annotations Trustworthiness In Journal of Trust Management, Springer. (Accepted, 2014)