Post on 23-Feb-2016
description
transcript
CiteSight:Contextual Citation Recommendation with Differential SearchAvishay Livne1, Vivek Gokuladas2, Jaime Teevan3, Susan Dumais3, Eytan Adar11University of Michigan, 2Qualcom, 3Microsoft
#SIGIR18 #JaimesBackyard
CiteSight:Contextual Citation Recommendation with Differential SearchAvishay Livne1, Vivek Gokuladas2, Jaime Teevan3, Susan Dumais3, Eytan Adar11University of Michigan, 2Qualcom, 3Microsoft
Search Engines Focus on Speed
Why Do We Cite?
• Paying homage to pioneers• Giving credit for related work• Identifying methodology• Providing background• Correcting one’s work• Correcting the work of others• Substantiating claims• …
[Garfield, 1965]
How Do We Cite?
• Many resources– Search engines– Bibliographic tools– Colleagues
• Work practice– Papers we know– Papers we should know
Why × How = 2 Specs
• Spec 1– I know what I want, give it to me now– Citation context:
• “… calculating the differences between blocks of text [“• Spec 2– I don’t know or can’t remember what I want
• [cite]• Complex, dynamic search space = slow– Inherent trade-off
• Can we build a system to support both?
The CiteSight User Interface
Split World Into Two
Stuff I don’t know about
Stuff I want fast = stuff I know
about
Microsoft Academic
Strategy
• Small, personalized index– Updated dynamically• What you’ve cited before• What you’ve cited now• What other people have cited
– Venue, co-citation, etc.
• Run a big index for everything else
Ranking
• Query: Citation context– “… calculating the differences between
blocks of text [“
• Dynamic recommendations– Immediately: Search the cache– In the background: Search the full index
• Rank retrieved papers:– Gradient boosted regression tree – Features: network + text
• Popularity, author similarity, textual similarity,…
Citation Context• Citation context
is really good at picking out “winners”
• People talk about a paper the same way as you!• Not the same
way the author talks about their work
Paper text
Bob et al. introduced ABC in […]
XYZ is similar to ABC […]
We utilize ABC to…[…]
That’s nice…
(S. Redner, 1998)
Citations
Context Coupling
Popular paper Less-popular paper
A B
• A and B related– Co-cited: When B
is mentioned, A is• “Borrow”
contexts from A to B
• Borrowed context used as a feature in ranking papers
CiteSight Evaluation
• Can CiteSight predict existing citations?– 1000 randomly selected CS papers
(2011)• Criteria: 20-40 citations
– 5-fold cross validation–Metric: NDCG• Gain of 1 when guesses correct citation• Gain related to # of co-citations for close
guesses• User feedback from 5 CS grad
students
Results
• Large improvement– Context coupling– All features
Features NDCG@10Text only 40.8%Context coupling
46.5%
All features 61.9%
Results
• Large improvement– Context coupling– All features– Citation-related
features > text• More info =
better– Authors– Citations, to a
point
Features NDCG@10Text only 40.8%Context coupling
46.5%
All features 61.9%+ keywords 46.5%+ title 46.6%+ authors similarity
47.5%
+ abstract 47.8%+ citation count 48.6%+ venue relevancy
49.2%
+ citations 53.0%+ co-citations 56.7%+ authors history
57.6%
Cache v. Corpus
• Relevance– Cache accounts for
46% of NDCG@10 of the corpus
– 10% cache is better
• Speed– Cache: 6 ms• Instantaneous!
– Corpus: 450 ms
Summary
• Differential need for speed• CiteSight – differential search– Two different use cases = two indices
1. Local index updated dynamically, contextually
2. Global index with full content– Context coupling improves relevance– Local index improves speed
• Able to provide instantaneous results• Often relevant because contextually updated
Questions?