Date post: | 15-Dec-2015 |
Category: |
Documents |
Upload: | sam-bruckman |
View: | 228 times |
Download: | 1 times |
Searching the “Web of things”
Lin et. al., WWW 2012
At least 14% of Web search
queries mention target type or
category
Telegraphic entity search queries
Telegraphic queries with target type
woodrow wilson president university
dolly clone institute
hermitage museum bank river
lead singer led zeppelin band
losing team baseball world series 1998
No reliable syntax clues for the search engine• Free word order• No or rare capitalization• Rare to find quoted
phrases• Few function or relational
words
Execution Ready Query
Telegraphic
NLQ
Template
Query Interpretation
Ranking
2-stage process
How to answer entity queries?(simplified view of related work)
e1e2e3
Knowledge base
Telegraphic Query
Our Proposal
e1e2e3
AnnotatedCorpus
Interpretation
response
Interpretation
response
Interpretation
response
Generative and
Discriminative models
Multiple Interpretations
Joint Query Interpretation and Ranking
The annotated Web
… By comparison, the Padres have been to two World Series, losing in 1984 and 1998. …
Entity: San_Diego_Padres
Type: Major_league_baseball_teams
Type: All
subTypeOf
instanceOf
mentionOf
Type hierarchy
Annotateddocument
Query: losing team baseball world series 1998
Query = type hints + word matchers Large type catalog
• Most query words match some type
Padres rarely co-occurs with hockey• Can know this only
from corpus stats
Query: losing team baseball world series 1998
Incorrect type:World_Series_Hockey_teams
Query: losing team baseball world series 1998
Large type catalog• Most query words
match some type
Padres rarely co-occurs with hockey• Can know this only
from corpus stats
Need joint type inference and snippet scoring
Query: losing team baseball world series 1998
Correct Type:Major_league_baseball_teams
Entity: San Diego Padres
By comparison, the Padres have been to twoWorld Series, losing in 1984 and 1998.
mentionOf
Word matchesinstanceOf
Evidence snippet
Query = type hints + word matchers
Generative model : generate query from entity
San Diego Padres
Major league baseball team
type context
E
TPadres have been to two World Series, losing in 1984 and 1998
Type hint :
baseball , team
losing team baseball world series 1998
Z
Context matchers : lost , 1998, world seriesswitch
model model
q losing team baseball world series 1998
Choose type to
describe entity
Generative approach : plate diagram
W Z
E
T
Type description language model
For each query
Entity context language
modelChoose entity
For each query
word…
“Switch” variables:
word hints at type or is a matcher?
Generate query word
hints matchers
Discriminative model : separatecorrect and incorrect entities
Chakrabarti
San_Diego_Padres
losing team baseball world series 1998(baseball team)
losing team baseball world series 1998(baseball team)
losing team baseball world series 1998(t = baseball team)
1998_World_Series
losing team baseball world series 1998
(series)
losing team baseball world series 1998
(series)
losing team baseball world series 1998
(t = series)
: losing team baseball world series 1998q
Compatibility between matchers and snippets
that mention e
Feature vector design inspired by generative
Feature vector given query, entity, type,
switches
Models type prior
Pr(t|e)
Models entity prior
Compatibility between hint words and type
Hints Matchers
Generative:
Discriminative:
Discriminative framework
Non-convex formulation Annealing algorithms
Constraints are formulated using the best scoring interpretation
Testbed
YAGO entity and type catalog• ~0.2 million types and 1.9 million entities
Annotated corpus• Web corpus having 500 million pages• ~ 16 annotations per page
~700 entity search queries• TREC + INEX • Converted to telegraphic form, with most probable type
and answer entities
Experiment 1 : Entity ranking using joint inference
To reach : Human recommended type To surpass : Most generic type in catalog (no type
inference) Entity level ndcg measure (map and mrr follow the
same trend, details in paper)
Human > Discriminative > Generative > Generic
Generative significantly better than generic (lower)• Generative fills 28% gap to human (upper)
Discriminative significantly better than generic (lower)• Discriminative fills 43% gap to human (upper)
Discriminative significantly better than generative• Easier to handle balance diverse scales of probabilities
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8 9 10Rank
ND
CG
humandiscriminativegenerativegeneric
Human > ?? > Generic
Generic v/s discriminativeCorrect hint match & type choicecathedral claude monet painting
Incorrect hint match & type choiceamazing grace hymn writer
Discriminative better than human Correct entity unreachable from human
recommended type • discriminative recovers using corpus feedback
patsy cline producer patsy cline producer
producer manufacturer
Discriminative
Owen Bradley
Experiment 2 : Target Type Inference Aggregate ranks of top-k interpretations to rank
types Compare type-level ndcg with B&N 2012
hermitage museum bank river (museum)
hermitage museum bank river (river)
hermitage museum bank river (building)
rivermuseumbuilding
possibletarget type
... ...
k
Joint prediction improves type inference
Data : [B&N 2012], Dbpedia catalog
Joint prediction improves type inference too!
(river)+ matchers
Experiment 3 : joint v/s two-stage Two-stage
1. Best type prediction from experiment (2)
2. Launch type restricted query on annotated corpus
Top m types to improve recall Measure entity-level ndcg
rivermuseumbuilding
Stage 1 Type inference Form query
(river OR museum)+ matchers
Ranking
Stage 2 Ranking
Joint entity ranking ?? two-stage
Not much difference with the benefit of more types in 2-stage
Joint type prediction and ranking significantly better than 2-stage
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10Rank
ND
CG
Joint2stage(m=1)2stage(m=5)2stage(m=10)
Joint entity ranking better than two-stage
Conclusion Large percentage of Web search queries contain a
mention of the target type Identification of target type hint words and type
itself is rewarding, but non-trivial Joint query interpretation and ranking approach
significantly better than two stage Joint prediction improves type inference Datasets available at bit.ly/WSpxvr
References1) Patrick Pantel, Thomas Lin, Michael Gamon:
Mining Entity Types from Query Logs via User Intent Modeling. ACL (1) 2012: 563-571
2) K. Balog and R. Neumayer: Hierarchical Target Type Identification for Entity-oriented Queries, In CIKM 2012, October 2012
3) T. Lin, P. Pantel, M. Gamon, A. Kannan, A. Fuxman: Active Objects: Actions for Entity-Centric Search, WWW 2012
Chakrabarti
Components of the model Entity prior
• (Weighted) fraction of snippets attached to an entity in the corpus
Type• Generality or specificity of types
Hint-type compatibility• Probability of generating hint words from a language
model built using type description• Hint sub-sequence matches some type name exactly
Matcher-entity compatibility• Weighted fraction of snippets attached to an entity,
retrieved using matchers• Rarity of matchers + number of supporting snippets
Implementation details
Additive features• One generic query executed on index, rest in memory
Pruned large search space using easy heuristics• Continuous hint words
Not entity disambiguation in query
ymca in query refers to song or organization? Similar to entity disambiguation in documents Uses accompanying words Misinterpreting target type: usually disastrous Avoid early or hard commitment
Query:ymca lyrics
Query:ymca address
Entity:YMCA_(song)
Entity:YMCA_(org)
Type: Music Type: Organization
instanceOf instanceOf
Lear
n to
pic
mod
el
Lear
n to
pic
mod
el
Better type description model More generic query than “hint+matchers”
Entities as literals
Different models Explore non-linear models (boosting) List-wise loss
Use click data
Future work
Choose type to
describe entity
Generative framework
W Z
E
T
Type description language model
For each query…
Entity context language model
Choose entity to describe
For each query word…
“Switch” variables: decide if
word hints at type or is a
matcher
Generate query word
Compatibility between matchers and snippets
that mention e
Discriminative framework
Feature vector given query, entity, type,
switches
Models type prior
Pr(t|e)
Models entity prior
Compatibility between hint words and type
Hints Matchers
Given q, score of response e is:
Ranking model trained by distant supervision
Joint entity ranking better than two-stage State of the art target
type predictor• Does not use corpus
information Pick top k types to
improve type recall Launch type-
restricted query on annotated corpus
Significantlyworse than jointtype predictionand ranking 0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6 7 8 9 10Rank
ND
CG
Joint2stage(k=1)2stage(k=5)2stage(k=10)