+ All Categories
Home > Documents > URI Identity Management for Semantic Web Data Integration and Linkage

URI Identity Management for Semantic Web Data Integration and Linkage

Date post: 07-Feb-2016
Category:
Upload: hayes
View: 65 times
Download: 0 times
Share this document with a friend
Description:
URI Identity Management for Semantic Web Data Integration and Linkage. Afraz Jaffri , Hugh Glaser, Ian Millard Electronics and Computer Science University of Southampton. URI Identity Management for Semantic Web Data Integration and Linkage. Presentation Outline. Linked Data - PowerPoint PPT Presentation
Popular Tags:
21
Afraz Jaffri, Hugh Glaser, Ian Millard Electronics and Computer Science University of Southampton
Transcript
Page 1: URI Identity Management for Semantic Web Data Integration and Linkage

Afraz Jaffri, Hugh Glaser, Ian MillardElectronics and Computer Science

University of Southampton

Page 2: URI Identity Management for Semantic Web Data Integration and Linkage

2SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

1. Linked Data2. URI Multiplicity3. The Problem of Coreference4. URI Identity Management Approaches5. The Problem with owl:sameAs6. The Consistent Reference Service (CRS)7. CRS Architecture8. A CRS Application: The RKB Explorer9. Summary and Future Work

Page 3: URI Identity Management for Semantic Web Data Integration and Linkage

3SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

• DBpedia has URIs for approximately 2 million entities• Linked datasets contain many overlapping entities• A single entity can have a number of URI’s• Entities are linked using owl:sameAs

Example

<http://dbpedia.org/resource/Berlin> <owl:sameAs> <http://sws.geonames.org/2950159>

Page 4: URI Identity Management for Semantic Web Data Integration and Linkage

4SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

http://www.rkbexplorer.com

• Contains URIs for more than 10 million entities• Data relating to people, projects, papers and

institutions• A single entity has a number of URIs (even within

the same repository)• Entities are linked using CRSs

DBLP

Page 5: URI Identity Management for Semantic Web Data Integration and Linkage

5SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

URIs for ‘Spain’:http://dbpedia.org/resource/Spainhttp://ww4.wiwiss.fu-berlin.de/factbook/resource/Spainhttp://sws.geonames.org/2510769http://www4.wiwiss.fu-berlin.de/eurostat/resource/countries/Espa%C3%Bla

URIs for ‘Hugh Glaser’:http://acm.rkbexplorer.com/rdf/resource-P112732

http://citeseer.rkbexplorer.com/rdf/resource-CSP109020 http://citeseer.rkbexplorer.com/rdf/resource-CSP109013 http://citeseer.rkbexplorer.com/rdf/resource-CSP109011 http://citeseer.rkbexplorer.com/rdf/resource-CSP109002 http://dblp.rkbexplorer.com/rdf/resource-27de9959 http://europa.eu/People/#person-0ff816fa http://resist.ecs.soton.ac.uk/wiki/User:hugh_glaser http://www.ecs.soton.ac.uk/info/#person-00021

Page 6: URI Identity Management for Semantic Web Data Integration and Linkage

6SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

Tom Anderson – http://www4.wiwiss.fu-berlin.de/dblp/resource/person/109074

Is dc:creator of <http://www4.wiwiss.fu berlin.de/dblp/resource/record/conf/dac/MorettiHNCKABDF01> is dc:creator of

<http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ftcs/SaeedLA91> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ftrtft/LemosSA92>is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/hybrid/AndersonLFS92>is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/iccbss/AndersonFRR03>is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/iciap/TruccoARI05>is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/icnp/ElySWSA01> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ifip/AndersonRR04>is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/sc/BorchersASW95>is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/seaai/AndersonH98> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/srds/Anderson86> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/words/AndersonFRR05>is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/bell/LiuBFSRA04> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/cj/LemosSA92>is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/Anderson01>is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/Anderson03> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/dt/ZorianASTI96> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/software/LemosSA95> is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/ton/SavageWKA01>is dc:creator of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/journals/tse/AndersonBHM85> is dblp:editor of <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/sigcomm/2006>

Vice President O-in Design Automation inc. USAProfessor, University of NewcastleProfessor, Heriot Watt UniversityUniversity of WashingtonUniversity of California, BerkelyTom Andersen - University of DenmarkLucent Technologies, Illinois

Page 7: URI Identity Management for Semantic Web Data Integration and Linkage

7SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

• The problem of coreference has existed for many years

• Physical Libraries disambiguate authors through Date of Birth

• Digital Libraries still have the problem of author disambiguation

• Problems caused by variations in naming schemes e.g. ‘Glaser, H.’

‘H. Glaser’ ‘Glaser, Hugh’ ‘H. Glazer’

Page 8: URI Identity Management for Semantic Web Data Integration and Linkage

8SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

• Coreference Problem referred to as ‘Record Linkage’

• Matching entities between records similar to matching entities between datasets

• Database linkage is easier due to imposed schema

• Formal theory of Record Linkage proposed by Fellegi & Sunter (1969)

• Uses coded agreements between each field (property) to give the probability of record (instance) equivalence

• Can be adapted for use on the Semantic Web

Page 9: URI Identity Management for Semantic Web Data Integration and Linkage

9SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

• Coreference on the Semantic Web is defined as being the situation where two or more URI’s are used for a single non-information resource

• URI usage can change with context

• Non-Information resources are hard to define precisely

Examples

‘Hugh Glaser’ at Southampton vs. ‘Hugh Glaser’ at Imperial

‘Harry Potter and the Order of the Phoenix’ in Hardback vs. Softback ISBN: 978-0747561071 978-

0747551003

Page 10: URI Identity Management for Semantic Web Data Integration and Linkage

10SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

• Use a centralised naming authority to issue URIs for every entity in the world

• Let everyone create their own URIs and link them to ‘official’ URIs (using owl:sameAs)

• Let everyone create their own URIs and register them at a centralised repository

• Let everyone create their own URIs and let them be managed by many decentralised repositories

• In all of the above encourage reuse and linking as far as possible

Page 11: URI Identity Management for Semantic Web Data Integration and Linkage

11SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

• owl:sameAs was designed for a specific purpose• Resources linked with owl:sameAs have the same

identity i.e. The subject and object are exactly the same resource

• owl:sameAs has been misused for Linking Open Data

• Linking can occur between two very different resources, e.g. Tom Anderson

• Reasoning with LOD will have unintended consequences

Page 12: URI Identity Management for Semantic Web Data Integration and Linkage

12SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

<rdf:Description rdf:about=“#URI-1”> <rdf:Description rdf:about=“#URI-2”> <vcard:FN>Hugh Glaser</vcard:FN> <vcard:FN>Hugh Glaser</vcard:FN><vcard:EMAIL>[email protected]</vcard:EMAIL> <vcard:EMAIL>[email protected]</vcard:EMAIL><vcard:ROLE>Reader</vcard:ROLE></rdf> <vcard:ROLE>Lecturer</vcard:ROLE></rdf>

Assert <URI-1> <owl:sameAs> <URI-2>

SELECT ?x WHERE {<URI-1> vcard:EMAIL ?x}

Returns [email protected] [email protected]

Which email belongs to which role?

Using owl:sameAs means that both URI’s become indistinguishable even though they may refer to different entities according to the context in which they are used.

Page 13: URI Identity Management for Semantic Web Data Integration and Linkage

13SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

• Data (Knowledge) providers publish data (knowledge)• Resources from one provider cannot be guaranteed to

be the same as resources from another provider• Knowledge will be published and made

dereferenceable at the domain that the publisher has control over

• URIs will be constructed from the domain name of the publisher’s site

• An intermediate service groups URIs of resources that may be the same

• This knowledge is made available upon dereferencing the URI of a resource

Page 14: URI Identity Management for Semantic Web Data Integration and Linkage

14SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

• Can be seen as a conventional Knowledge Base• Contains knowledge about the URIs in a

repository• URIs referring to the same resource are grouped

together in ‘Bundles’• A Bundle has properties:• Coref:hasEquivalentReference – The URIs in a bundle are

grouped together using this predicate

• Coref:hasCanonicalReference – One URI in a bundle can be made to be the canonical representation i.e. The preferred URI

• Coref:updatedOn – The date of the last update to the bundle

Page 15: URI Identity Management for Semantic Web Data Integration and Linkage

15SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

@prefix coref: <http://www.resist.ecs.soton.ac.uk/ontology/coref#> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://citeseer.rkbexplorer.com/crs/coref#bundle1> a coref:Bundle ;

coref:hasCanonicalReference

<http://citeseer.rkbexplorer.com/rdf/resource-CSP109002> ;

coref:hasEquivalentReference <http://citeseer.rkbexplorer.com/rdf/resource-CSP109011> , <http://citeseer.rkbexplorer.com/rdf/resource-CSP109020> , <http://citeseer.rkbexplorer.com/rdf/resource-CSP109013> , <http://citeseer.rkbexplorer.com/rdf/resource-CSP109002> .

Page 16: URI Identity Management for Semantic Web Data Integration and Linkage

16SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

http://southampton.rkbexplorer.com/id/person-00021

RESOLVE

RETRIEVE

RDF

RDF

http://southampton.rkbexplorer.com/data/person-00021

http://southampton.rkbexplorer.com/description/person-00021

KB

CRS

Non-Information Resource

Information Resource

Information Resource

Text/Html RDF/XML

Application

Page 17: URI Identity Management for Semantic Web Data Integration and Linkage

17SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

• Finding all equivalences (bundles) is up to the application

• A separate activity from coreferencing a single data source

• Services such as Sindice can perform this function for free

• To perform the equivalence closure just follow the crs:hasCRS links

• Scalability is ensured by not including all possible bundles in every CRS

Page 18: URI Identity Management for Semantic Web Data Integration and Linkage

18SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

• The Resilience Knowledge Base Explorer displays communities of practice for people, projects and publications from the RKB

• Uses multiple CRSs to disambiguate people and publications

• One CRS per knowledge base ensures scalability• Multiple SPARQL queries• Look yourself up!• www.rkbexplorer.com/explorer

Page 19: URI Identity Management for Semantic Web Data Integration and Linkage

19SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

• Equivalence Mining is a difficult task that requires multiple algorithms

• Adding policies to determine the trust level of a CRS

• Establishing the authority of a CRS over a KB• Establishing performance metrics• Collaborating with LOD community for wide scale

deployment• Formalising the linking methodology

Page 20: URI Identity Management for Semantic Web Data Integration and Linkage

20SSWS07 - Vilamoura, Potugal

URI Identity Management for Semantic Web Data Integration and Linkage

• Coreference exists in many disciplines and will exist on the Semantic Web

• The equivalence of non-information resources depends on context

• The semantics of owl:sameAs do not fit with the current usage in Linked Data

• The CRS is a solution that is being deployed on a large knowledge-based infrastructure

• Its my knowledge, so let me name it!

Page 21: URI Identity Management for Semantic Web Data Integration and Linkage

SSWS07 - Vilamoura, Potugal 21

Questions?

URI Identity Management for Semantic Web Data Integration and Linkage


Recommended