+ All Categories
Home > Documents > Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul...

Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul...

Date post: 18-Jan-2016
Category:
Upload: constance-rose-carter
View: 221 times
Download: 0 times
Share this document with a friend
Popular Tags:
15
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar
Transcript
Page 1: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Linked Data Profiling

Andrejs Abele

National University of Ireland, Galway

Supervisor: Paul Buitelaar

Page 2: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Overview

Terminology Motivation My approach Evaluation Conclusion Future work

Page 3: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Linked Data is about using the Web to connect related data that was not previously linked.

Resource Description Framework is represented by sets of subject-predicate-object triples, where the elements may be URIs, literals

https://www.insight-centre.org/users/andrejs-ābele foaf:name “Andrejs Ābele”

Linked Open Data Cloud is a collection of Linked Data resources that are open and freely available

Terminology

Page 4: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Linked Open Data Cloud Diagram

Publications

Life Sciences

Cross-Domain

Social Networking

Geographic

Government

Media

User-Generated Content

Linguistics

Page 5: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Motivation

Linked Data is hard to understand for humans Only a small number of datasets provide a

human readable overview or comprehensive metadata

When adding a new dataset to the LOD cloud, connections have to be identified to as many other relevant LOD datasets as possible

LOD Cloud Diagram relays on human classification

Page 6: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Existing solutions for LD profiling

[1] http://demo.seco.tkk.fi/aether/#/ [2] https://www.hpi.uni-potsdam.de/naumann/sites/prolod++/#[3] http://lodlaundromat.org/

[4] http://stats.lod2.eu/ [5] http://demo.seco.tkk.fi/aether/#/[6] http://rdfstats.sourceforge.net/

Loupe1

ProLOD++2

LOD Laundromat3

LODStat4

Aether5

RDF-stats6

Page 7: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Domain identification method using DBpedia

Topic Extraction

Domain Identification

Domain

Page 8: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

• Input : Bio2RDF-sgd

• Description: The Saccharomyces Genome Database (SGD) collects and organizes information about the molecular biology and genetics of the yeast Saccharomyces cerevisiae

1. Most frequent terms (sgd_vocabulary, query, proper, phenotype, experiment)

2. Literal containing one of the terms ("protein [sgd_vocabulary:protein]@en")

3. Identify DBpedia concept (http://dbpedia.org/resource/Protein)

4. Identify Category (http://dbpedia.org/resource/Category:Molecular_biology)

5. Identify domain under which category fits best (Biology =>Life Sciences)

Example

Page 9: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

DatasetsLOD cloud datasets (annotated in LOD Cloud Diagram)405 datasets, 9 domains • Media (13)• Linguistics(34)• Publications (111)• Social Networking (41)• Geography (29)• Government (65)• Cross Domain (25)• User Generated (52)• Life Sciences (35)

Page 10: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

1. Extract URIs of properties and classes from datasets2. Use classes and properties as features3. Classify using Support Vector Machine classifier4. Use Precision and Recall as metrics

Extended baselineEnrich the data with human annotated tags from Linked Open Vocabularies1

1. http://lov.okfn.org/dataset/lov/

Baseline approach

Page 11: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Precision and Recall for different domains using SVM

Media

Linguist

ics

Publicatio

ns

Social n

etwork

ing

Geogra

phy

Gove

rnm

ent

Cross

dom

ain

User g

enerate

d

Life s

cience

s0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

PrecisionRecall

Page 12: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Correctly Classified Instances

Classes Properties Classes + Properties

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

From DatasetDataset+LOVLOV

Page 13: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Conclusion

• Does not require training

• Works with new and customized vocabularies

• Works only if datasets contain literals

• Can not identify User-Generated Content and Cross-Domain

• Using just classes and properties is hard to improve results above 75%

Page 14: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Future Work

• Evaluate alternative classification algorithms

• Use Literals and URIs for classification

• Classify datasets in more specific subdomains

Page 15: Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.

Thank you!


Recommended