Entity Search: Building Bridges between Two Worlds
Krisztian Balog, Edgar Meij, and Maarten de RijkeISLA, University of Amsterdamhttp://ilps.science.uva.nl
Entity search
• Information organized around entities
• Instead of finding documents about the entity, find the entity itself
• Problem looked at by both the Information Retrieval (IR) and the Semantic Web (SW) communities
Entity search tasks
• Entity ranking
• List completion
• Related entity finding
Motivation
• To which extent are IR and SW methods capable of answering information needs related to entity finding?
Where are we now?
• Information Retrieval
• Identifying and ranking entities in large volumes of data
• Mostly based on co-occurrences between terms and entities
• Generated models are not always meaningful for human consumption
Where are we now?
• Semantic Web
• Structured data, naturally organized around entities
• Entity retrieval is as simple as running SPARQL queries?
• Free-text querying is more appealing to (naive) end users
Related entity finding
• Given
• Input entity E (name plus homepage)
• Type T of the target entity (person, organization, or product)
• Narrative R (describes nature of relation)
• Return homepages of related entities
Example topics(E) Source entity name
(E) Source entity URL
(T) Target type
(R) Narrative
Medimmune, Inc.
clueweb09-en0008-26-39300
Product
Products of Medimmune, Inc.
(E) Source entity name
(E) Source entity URL
(T) Target type
(R) Narrative
Boeing 747
clueweb09-en0005-75-02292
Organisation
Airlines that currently use Boeing 747 planes.
Aim
• Compare IR and SW approaches on the related entity finding task
• Focusing on finding all relevant entities, but not on actually ranking them
Related entity findingOur variation
• TREC Entity 2009 topics (20)
• Map source entity to a Wikipedia page (17)
• Map target category to the most specific class within the DBPedia ontology
• Ground truth: Wikipedia pages from relevance assessments
Example topic(E) Source entity name
(E) Source entity URL
(T) Target type
(R) Narrative
Boeing 747
clueweb09-en0005-75-02292
Organisation
Airlines that currently use Boeing 747 planes.
Source entity
DBPedia-owl
Relation
Boeing_747
Organisation/Company/Airline
Airlines that currently use Boeing 747 planes.
IR approaches
• Aggregation of approaches employed at the TREC Entity track
• Various ways of recognizing and ranking entities
• Common to all is a mechanism for capturing the co-occurrence between source and target entities
A typical IR approachQuery (input entity, relation)
Document/snippet retrieval
Answer candidate extraction
Answer candidate (type) filtering
Answer candidate ranking
Output (related entities)
Two SW approaches
• SPARQL query
• Exhaustive graph search
• Find all paths between E and T in a knowledge base
• The depth of search is limited
SELECT DISTINCT ?m ?rWHERE { ?m rdf:type dbpedia-owl:Drug . { ?m ?r dbpedia:MedImmune } UNION { dbpedia:MedImmune ?r ?m }}
SPARQL on DBPedia
Query: Products of Medimunne, Inc.
?m ?r
dbpedia:Amifostine dbp-prop:wikilink
dbpedia:Blinatumomab dbp-prop:wikilink
dbpedia:Motavizumab dbp-prop:wikilink
dbpedia:Palivizumab dbp-prop:wikilink
SPARQL on DBPediaQuery: Airlines that Air Canada has code
share flights with.
?m ?r
dbpedia:Air_Canada dbp-prop:wikilink
dbpedia:Austrian_Airlines dbp-prop:wikilink
dbpedia:Japan_Airlines dbp-prop:wikilink
dbpedia:Lufthansa dbp-prop:wikilink
dbpedia:Turkish_Airlines dbp-prop:wikilink
......
dbpedia:Air_Ontario dbp-ontology:Company/parentCompany
dbpedia:Air_Canada_Tango dbp-ontology:Company/parentCompany
dbpedia:Canadian_Airlines dbp-ontology:foundationPerson
SPARQL on DBPediaQuery: Members of the band Jefferson Airplane.
?m ?r
dbpedia:Jim_Morrison dbp-prop:wikilink
dbpedia:Jimi_Hendrix dbp-prop:wikilink
......
dbpedia:Jack_Casady dbp-ontology:associatedMusicalArtist
dbpedia:Paul_Kantner dbp-ontology:associatedMusicalArtist
dbpedia:Joey_Covington dbp-ontology:associatedMusicalArtist
dbpedia:Marty_Balin dbp-ontology:associatedMusicalArtist
......
dbpedia:Grace_Slick dbp-prop:pastMembers
dbpedia:Jorma_Kaukonen dbp-prop:pastMembers
......
Findings
• IR and SW methods find basically the same set of entities
• Most relations returned by SW methods are of type wikilink
Next
• Extend search to Linked Open Data (LOD)
• We use the Linked Data Semantic Repository (LDSR)
SPARQL on LOD
?m ?r
dbpedia:Amifostine dbp-prop:wikilink
dbpedia:Blinatumomab dbp-prop:wikilink
dbpedia:Motavizumab dbp-prop:wikilink
dbpedia:Palivizumab dbp-prop:wikilink
dbpedia:Motavizumab fb:base.bioventurist.product.developed_by
dbpedia:Palivizumab fb:base.bioventurist.science_or_technology_company.products
dbpedia:Motavizumab fb:base.bioventurist.product.developed_by
dbpedia:Palivizumab fb:base.bioventurist.science_or_technology_company.products
Query: Products of Medimunne, Inc.
Graph search on LOD Findings
• More entities as well as more diverse relations
• Having more data does not automatically improve results
• Some of the identified entities are now too general
Summarizing findings
• Information Retrieval
• Excellent ways of finding associations between topics and entities
• Tend to perform better for less popular entities (not represented in LOD)
• Missing: semantics of the found associations
Summarizing findings• Semantic Web
• Has the potential of generating a large number of candidate entities and relations
• Could be as simple as instantiating a SPARQL query
• For many queries LOD is very sparse w.r.t. semantically meaningful links between entities
Zooming out
• Enhance text-based models with semantic information from LOD
• Use IR models to discover and label links between entities in LOD
TREC Entity 2010
• Main task: Related entity finding
• Pilot task: List completion
• Given URIs of related entities, complete the list with additional entities from LOD
Questions?Krisztian Balog
http://staff.science.uva.nl/~kbalog