+ All Categories
Home > Documents > Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data...

Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data...

Date post: 11-Sep-2020
Category:
Upload: others
View: 9 times
Download: 0 times
Share this document with a friend
55
Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Semantic and Distributed Entity Search in the Web of Data Robert Neumayer [email protected] Norwegian University of Science and Technology Trondheim, Norway March 6, 2013 1/48
Transcript
Page 1: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Semantic and Distributed Entity Search in

the Web of Data

Robert [email protected]

Norwegian University of Science and TechnologyTrondheim, Norway

March 6, 2013

1/48

Page 2: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Outline1. Entity Search and the Web of Data

The Web of DataWhat are Entities?

2. Centralised Entity SearchEntity ModellingExperiments

3. Federated Entity SearchIntroductionExperimental Results

4. P2P Entity SearchIntroduction and ApproachExperiments

5. Conclusions and Future WorkFuture Work

2/48

Page 3: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Overview

• Describe the main components of the last four years ofmy research

• Try to give a good motivation and show the “wholepicture”

• Show real-world examples

• Pointers on future work

• Do it in an accessible way

3/48

Page 4: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

What?

• Semantic and Distributed Entity Searchin the Web of Data

• Definitions (in reverse order)• Web of Data• Entities• Entity Search• Centralised or distributed?

4/48

Page 5: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

The Web of Data• Blog post by Tim Heath

• . . . slight disagreement

• Terms:• Linked Data• Web of Data

5/48

Page 6: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

The Web of Data• Blog post by Tim Heath

• . . . slight disagreement

• Terms:• Linked Data• Web of Data

“. . . Linked Data is just anattempt to rebrand the SemanticWeb . . . ”

5/48

Page 7: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

The Web of Data• Blog post by Tim Heath

• . . . slight disagreement

• Terms:• Linked Data• Web of Data

“. . . Personally I use the termWeb of data largelyinterchangeably with the termSemantic Web . . . ”

5/48

Page 8: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

The Web of Data• Blog post by Tim Heath

• . . . slight disagreement

• Terms:• Linked Data• Web of Data

“. . . The precise term I usedepends on the audience. WithSemantic Web geeks I saySemantic Web, with others I tendto say Web of data . . . ”

5/48

Page 9: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

. . . How We Use the Terms

• Linked Data• Technical foundation• “means of publishing/exchanging interconnected data”

• Web of Data / Semantic Web• Largely interchangeable• “an interconnected Web of Data available for search and

research”• example wikipedia connecting to other resources

6/48

Page 10: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Linked Open Data 2007

SWConference

Corpus

DBpedia

RDF Book Mashup

DBLPBerlin

Revyu

Project Guten-berg

FOAF

Geo-names

Music-brainz

Magna-tune

Jamendo

World Fact-book

DBLPHannover

SIOC

Sem-Web-

Central

Euro-stat

ECS South-ampton

BBCLater +TOTP

Fresh-meat

Open-Guides

Gov-Track

US Census Data

W3CWordNet

flickrwrappr

Wiki-company

OpenCyc

NEW! lingvoj

Onto-world

NEW!

NEW!NEW!

7/48

Page 11: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

8/48

Page 12: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Entities 1/4• Knowledge bases are growing, so what?• “Something’s interesting when Google do it”

• Google Knowledge graph (2012)

9/48

Page 13: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Entities 1/4• Knowledge bases are growing, so what?• “Something’s interesting when Google do it”

• Google Knowledge graph (2012)

9/48

Page 14: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Entities 1/4• Knowledge bases are growing, so what?• “Something’s interesting when Google do it”

• Google Knowledge graph (2012)

9/48

Page 15: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Entities 2/4• What is an entity?

• (Typed) object

10/48

Page 16: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Entities 3/4

• Once identified, the entity has• Attributes and relations

11/48

Page 17: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Entities 4/4

• Free text

• Date

• Director

• Relations (Links)• Outgoing• Ingoing

12/48

Page 18: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

The Entity Search Task

ad-hoc entity retrieval1:

answering arbitrary information needs related toparticular aspects of objects [entities], expressed inunconstrained natural language and resolved using acollection of structured data

• Our main focus

• Realistic and frequent type of search

1J. Pound, P. Mika, and H. Zaragoza. “Ad-hoc object retrieval in the web of data”. In: Proc. of the 19th

Int. Conference on World Wide Web (WWW’10). 2010.

13/48

Page 19: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Top Google Searches 20122

• People do searchfor entities

• Persons• Products• Events

• BBB12 is bigbrother Brazil . . .

2http://www.google.com/zeitgeist/2012

14/48

Page 20: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Overview

• (Centralised) entity search

• Federated entity search

• Peer-to-peer (P2P) networks

15/48

Page 21: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Overview

• (Centralised) entity search

• Federated entity search

• Peer-to-peer (P2P) networks

15/48

Page 22: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Overview

• (Centralised) entity search

• Federated entity search

• Peer-to-peer (P2P) networks

15/48

Page 23: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Overview of Publications 1/3

• (Centralised) entity search

• Semantic Search Challenge3

• Hierarchical Entity Model4

• Strong Baselines5

3K. Balog, M. Ciglan, R. Neumayer, W. Wei, and K. Nørv̊ag. “NTNU at SemSearch 2011”. In: Proc. of the

4th Int. Semantic Search Workshop of the 20th Int. World Wide Web Conference WWW2011). 2011.4

R. Neumayer, K. Balog, and K. Nørv̊ag. “On the Modeling of Entities for Ad-hoc Entity Search in the Webof Data”. In: Proc. of the 34rd European Conference on Information Retrieval (ECIR’12). 2012.

5R. Neumayer, K. Balog, and K. Nørv̊ag. “When Simple is (more than) Good Enough: Effective Semantic

Search with (almost) no Semantics”. In: Proc. of the 34rd European Conference on Information Retrieval(ECIR’12). 2012.

16/48

Page 24: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Overview of Publications 2/3

• Federated entity search

• Collection ranking and selection6

• Ranking Distributed Knowledge Repositories7

6K. Balog, R. Neumayer, and K. Nørv̊ag. “Collection Ranking and Selection for Federated Entity Search”.

In: Proc. of 18th Int. Symposium of String Processing and Information Retrieval (SPIRE’12). Lecture Notes inComputer Science. 2012.

7R. Neumayer, K. Balog, and K. Nørv̊ag. “Ranking Distributed Knowledge Repositories”. In: Proc. of the

Int. Conference on Theory and Practice of Digital Libraries Research and Advanced Technology for Digital Libraries(TPDL’12). Lecture Notes in Computer Science. 2012.

17/48

Page 25: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Overview of Publications 3/3

• Peer-to-peer (P2P) networks

• Aggregation of Document Frequencies8

• Hybrid Aggregation in P2P Networks9

8R. Neumayer, C. Doulkeridis, and K. Nørv̊ag. “Aggregation of Document Frequencies in Unstructured P2P

Networks”. In: Proc. of 10th Int. Conference on Web Information Systems Engineering (WISE’09). LectureNotes in Computer Science. 2009.

9R. Neumayer, C. Doulkeridis, and K. Nørv̊ag. “A Hybrid Approach for Estimating Document Frequencies in

Unstructured P2P Networks”. In: Information Systems 36.3 (2011).

18/48

Page 26: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

1. Entity Search and the Web of DataThe Web of DataWhat are Entities?

2. Centralised Entity SearchEntity ModellingExperiments

3. Federated Entity SearchIntroductionExperimental Results

4. P2P Entity SearchIntroduction and ApproachExperiments

5. Conclusions and Future WorkFuture Work

19/48

Page 27: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Centralised Entity Search

• Research questions• How can traditional ad-hoc document retrieval

techniques be applied in the context of the Web of Data?• How can the structure of entities be exploited for the

purpose of ad-hoc retrieval?• How does field weighting affect search quality?

20/48

Page 28: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

From Predicates to Fields:

Structured Retrieval• How to represent entity data in terms of structured fields?

Text

Serenity2005119Serenity is a . . .Joss WhedonUnited StatesFilms based on tv seriesSpace WesternsFilmAdam BaldwinSummer GlauJewel Staite

(a) Unstructured . . .

Pred. type Value

Name SerenityAttributes 2005 119

Serenity is a . . .OutRelations Joss Whedon

United StatesFilms based on tv seriesSpace WesternsFilmAdam BaldwinSummer Glau

InRelations Best 2005 sci-fi filmFavourite film

(b) and Structured Entity Model

21/48

Page 29: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Entity Modelling Approaches

• Fields and predicates

• Somewhere in between one field and one field perpredicate

• We consider:• Unstructured entity model

• Collapse all predicates

• Structured entity model with predicate folding• Collapse within predicate types

• Hierarchical entity model• Use individual fields• Predicate type weighting

22/48

Page 30: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Structured Entity Model

• Collapsing all fieldsper type

• Name, Attribute,InRelation,OutRelation

• Smoothing on typelevel

• Linear mixture oftypes (mixture ofLMs)

e

tpt

...tpt

...

23/48

Page 31: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Hierarchical Entity Model• Type folding viable

alternative

• Preserve info aboutindividual predicates

• Use individual fields

• Three modelcomponents

• Term generation• Predicate

generation• Predicate type

generation

e

ppt

...

t

ppt t... ...

24/48

Page 32: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

2010/2011 Semantic Search

Challenge

• Given a keyword query, targeting a particular entity,provide a ranked list of relevant entities (i.e., URIs)

• Queries• Sampled from web search engine logs (142 in total)

• Data collection• Billion Triple Challenge 2009 (BTC) dataset• About 70 million entities• From sources like dbpedia.org or livejournal.com

• Relevance judgments• On a 3-point scale, collected using crowdsourcing

25/48

Page 33: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Experimental Results

• Ingoing relations have a marginal effect only

• Structured entity model improves compared tounstructured model

• Hierarchical model improves, but only for individualpredicate types

• Overall our results are competitive with the ones achievedat evaluation initiatives

• Preprocessing, preprocessing

• Collection quality

26/48

Page 34: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

When Simple is Good Enough

• Rather straigth forward approach

• Three components• Extended preprocessing

• Process entity names

• Fielded representation• Title and content fields

• Domain boosting• Boost DBpedia

• Compare state-of-the-art fielded retrieval models

27/48

Page 35: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Results

• Outperform all results from Semantic Search Challenge

• Still not outperformed by others

• Entity titles answer entity queries very well

• Extent of improvements surprising

28/48

Page 36: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

1. Entity Search and the Web of DataThe Web of DataWhat are Entities?

2. Centralised Entity SearchEntity ModellingExperiments

3. Federated Entity SearchIntroductionExperimental Results

4. P2P Entity SearchIntroduction and ApproachExperiments

5. Conclusions and Future WorkFuture Work

29/48

Page 37: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Federated Search 1/2

• Moving from centralised retrieval to a distributed setting• Starting from a “broker,” query is “routed” to the right

collection• Main research question:

• Can federated entity search benefit from entitymodelling?

30/48

Page 38: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Federated Search

1 Collectionrepresentation

2 Collectionselection

3 Result merging

Collection A

Collection B

Collection C

Summary A

Summary B

Summary C

Central broker

A

C

Q

B2

3

Q 1

Q

31/48

Page 39: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Collection Representation 1/2

• Collection-centric model

• Treat each collection as one large document

• Low cost, less accurate results expected

32/48

Page 40: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Collection Representation 2/2

• Entity-centric model

• Consider each collection in terms of its entities

• High cost, more accurate results expected

33/48

Page 41: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Collection Selection

• Predefined threshold• Top-k collection selection• Typically 5-20

• AENN: “All an Entity Needs is a Name”• Central repository of entity names

• AENN collection selection

• Trade-off between EC and CC approaches• Precision-oriented• Recall-oriented• Balanced

34/48

Page 42: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Result merging

• Once we havemultiplecollectionsselected

• These collectionsrank theirrespective entities

• . . . and theresultant rankingshave to be mergedinto one final list

Collection A

Collection B

Collection C

Summary A

Summary B

Summary C

Central broker

A

C

Q

B2

3

Q 1

Q

35/48

Page 43: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Experimental Setup

• Distributed environment

• Top 100 largest second-level domains from BTC• Three sets with different handling of DBpedia

• Relevance• Considered the #relevant entities from each collection

• Metrics• Collection ranking and result merging: Standard IR

metrics (MAP, MRR, nDCG)• Collection selection: Analogues of precision and recall,

plus the avg. #coll. selected

36/48

Page 44: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Experimental Results

• CC and EC methods are competitive

• Content-based methods stronger• Small difference for the DBpedia-only collection

• AENN outperforms other “title-only” methods

• AENN has positive effects on collection selection

37/48

Page 45: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

1. Entity Search and the Web of DataThe Web of DataWhat are Entities?

2. Centralised Entity SearchEntity ModellingExperiments

3. Federated Entity SearchIntroductionExperimental Results

4. P2P Entity SearchIntroduction and ApproachExperiments

5. Conclusions and Future WorkFuture Work

38/48

Page 46: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

P2P Search

• A query can originate from every peer and has to be“routed” via possibly many others

• Research questions:• Is P2P search a viable alternative to broker-based (i.e.,

federated search) architectures for entity retrieval?• How can the proposed frequency estimation technique

be further improved?39/48

Page 47: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Text documents, terms, and

distribution

• Many problems are caused by distributed collections

• What is distributed and how? random is easy

• Local / global document frequencies

• Different numbers of documents per node• Local importance and influence of collections

• Global information improves search results• How frequent is a term on the global level?

40/48

Page 48: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

DESENT

• We employ DESENT for P2Pnetwork creation

• Completely distributed anddecentralised

• Hierarchical overlaygeneration

• Individual peers• Zones formed by

neighbouring peers• Super zones based

previous level

41/48

Page 49: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Local Term Selection Process

• Based on local peer’s knowledge only

• Considers local terms and their frequencies

• Problems• Number of documents per peer• Document frequencies are unstable• Local / global importance issues

42/48

Page 50: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

• Compare to central case• Full info

• Central case without term info• Lucene scoring

• Aggregated values score in between

• Portable to LM and entity use-case

43/48

Page 51: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

1. Entity Search and the Web of DataThe Web of DataWhat are Entities?

2. Centralised Entity SearchEntity ModellingExperiments

3. Federated Entity SearchIntroductionExperimental Results

4. P2P Entity SearchIntroduction and ApproachExperiments

5. Conclusions and Future WorkFuture Work

44/48

Page 52: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Summary of Contributions

• Analysis of retrieval models wrt. their applicability toentity search

• Hierarchical models

• Structured retrieval for entity search

• Formalisation of federated search task in a languagemodel framework

• AENN method

• Benchmark data sets for federated entity search

• Entity search in P2P contexts

45/48

Page 53: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Query Target Type Identification

• Queries often target specifictypes (e.g. cars, actors, . . . )

• Sub problem: DBPediaontology target typeidentification

• What is a query’s “type”?• Ontology linking• How to exploit this info?

• See CIKM’12 poster, partlyINEX’12 submission

46/48

Page 54: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Query to Field/Predicate Mapping

• Which field/predicate best answers a query?

• Simple example: IMDB• field:actor• field:director• field:trivia

• Example query: “Clint Eastwood”

• What is the best field to answer the query?

• What is the best field to answer the individual queryterms?

• What results are we looking for (actor/director)?

47/48

Page 55: Semantic and Distributed Entity Search in the Web of Data€¦ · Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions

Entity Search and the Web of Data Centralised Entity Search Federated Entity Search P2P Entity Search Conclusions and Future Work

Last Slide

• Three basic purposes of oral presentations (in the spirit oftrusting Wikipedia10)

• Inform• Persuade• Good will

• I tried to do all of these things!

• Thanks for help and support

10http://en.wikipedia.org/wiki/Presentation

48/48


Recommended