Date post: | 09-May-2015 |
Category: |
Technology |
Upload: | daniel-tunkelang |
View: | 46,265 times |
Download: | 0 times |
© 2009 Endeca Technologies, Inc. All rights reserved.
Reconsidering Relevance
Daniel TunkelangChief Scientist, Endeca
© 2009 Endeca Technologies, Inc. All rights reserved.2
howdy!
• 1988 – 1992
• 1993 – 1998
• 1999 -
© 2009 Endeca Technologies, Inc. All rights reserved.3
overview
what is relevance?
what’s wrong with relevance?
what are the alternatives?
© 2009 Endeca Technologies, Inc. All rights reserved.4
but first let’s set the stage
© 2009 Endeca Technologies, Inc. All rights reserved.5
iconic businesses of the 20th and 21st centuries
I’m Feeling Lucky
© 2009 Endeca Technologies, Inc. All rights reserved.6
process and scale orchestration
© 2009 Endeca Technologies, Inc. All rights reserved.7
but there’s a dark side
© 2009 Endeca Technologies, Inc. All rights reserved.8
users are satisfied
© 2009 Endeca Technologies, Inc. All rights reserved.9
an interesting contrast
“Search on the internet is solved. I always find what I need.
But why not in the enterprise?
Seems like a solution waiting to happen.”
- a Fortune 500 CTO
© 2009 Endeca Technologies, Inc. All rights reserved.10
the real questions
• What is “search on the internet” and why is it perceived a solved problem?
• What is “search in the enterprise” and why is it perceived as an unsolved problem?
• And what does this have to do with relevance?
© 2009 Endeca Technologies, Inc. All rights reserved.11
easy vs. hard search problems
• easywhere to buy Ender in Exile?
• hardgood novel to read on the beach?
• easyproof that sorting has n log n lower bound?
• hardalgorithm to sort partially ordered set, given a constant-time comparator?
© 2009 Endeca Technologies, Inc. All rights reserved.12
what is relevance?
what’s wrong with relevance?
what are the alternatives?
© 2009 Endeca Technologies, Inc. All rights reserved.13
defining relevance
Relevance is defined as a measure of information conveyed by a document relative to a query.
It is shown that the relationship between the document and the query, though necessary, is not sufficient to determine relevance.
William Goffman, On relevance as a measure, 1964.
© 2009 Endeca Technologies, Inc. All rights reserved.14
we need more definitions
© 2009 Endeca Technologies, Inc. All rights reserved.15
let’s work top-down
• information retrieval (IR) =
study of retrieval of information (not data) from collection of written documents
retrieved documents aim at satisfying user information need
© 2009 Endeca Technologies, Inc. All rights reserved.16
IR assumes information needs
• user information need =
natural language declaration of informational need of user
• query =
expression of user information need in input language provided by information system
© 2009 Endeca Technologies, Inc. All rights reserved.17
relevance drives IR modeling
• modeling =
studies algorithms used for ranking documents according to system assigned likelihood of relevance
• model =
a set of premises and an algorithm for ranking documents with regard to a user query
© 2009 Endeca Technologies, Inc. All rights reserved.18
a relevance-centric approach
information Need query select from results
rank using IR model
USER:
SYSTEM:tf-idf PageRank
© 2009 Endeca Technologies, Inc. All rights reserved.19
what is relevance?
what’s wrong with relevance?
what are the alternatives?
© 2009 Endeca Technologies, Inc. All rights reserved.20
our first communication problem
information need query
• 2 words?• natural language?• telepathy?
© 2009 Endeca Technologies, Inc. All rights reserved.21
and the game of telephone continues
query rank using IR model
• cumulative error• relevance is subjective• what Goffman said
© 2009 Endeca Technologies, Inc. All rights reserved.22
and hopefully users feel lucky
rank using IR model
• selection bias• inefficient channel• backup plan?
select from results
© 2009 Endeca Technologies, Inc. All rights reserved.23
queries are misinterpreted
Results 1-10 out of about 344,000,000 for ir
© 2009 Endeca Technologies, Inc. All rights reserved.24
ranked lists are inefficient
© 2009 Endeca Technologies, Inc. All rights reserved.25
assumptions of relevance-centric approach
• self-awareness
• self-expression
• model knows best
• answer is a document
• one-shot query
© 2009 Endeca Technologies, Inc. All rights reserved.26
can we do better?
© 2009 Endeca Technologies, Inc. All rights reserved.27
what is relevance?
what’s wrong with relevance?
what are the alternatives?
© 2009 Endeca Technologies, Inc. All rights reserved.28
human-computer information retrieval
• don’t just guess the user’s intent– optimize communication
• increase user responsibility and control– require and reward human intellectual effort
“Toward Human-Computer Information Retrieval”
Gary Marchionini
© 2009 Endeca Technologies, Inc. All rights reserved.29
human computer information retrieval
© 2009 Endeca Technologies, Inc. All rights reserved.30
a concrete use case
• Colleague:
Hey Daniel! You should check out what this guy Steve Pollitt’s been researching. Sounds right up your alley.
• Daniel:
Sure thing, I’ll look into it.
© 2009 Endeca Technologies, Inc. All rights reserved.31
google him!
© 2009 Endeca Technologies, Inc. All rights reserved.32
google scholar him?
© 2009 Endeca Technologies, Inc. All rights reserved.33
rexa him?
© 2009 Endeca Technologies, Inc. All rights reserved.34
getting better
© 2009 Endeca Technologies, Inc. All rights reserved.35
hcir-inspired interface
© 2009 Endeca Technologies, Inc. All rights reserved.36
tags provide summarization and guidance
© 2009 Endeca Technologies, Inc. All rights reserved.37
my information need evolves as i learn
© 2009 Endeca Technologies, Inc. All rights reserved.38
hcir – implementing the vision
© 2009 Endeca Technologies, Inc. All rights reserved.39
scatter/gather: a search for “star”
© 2009 Endeca Technologies, Inc. All rights reserved.40
faceted search
© 2009 Endeca Technologies, Inc. All rights reserved.41
practical considerations
• which facets to show
• which facet values to show
• when to suggest faceted refinement
• how to automate faceted classification
© 2009 Endeca Technologies, Inc. All rights reserved.42
showing the right facets: microwaves
© 2009 Endeca Technologies, Inc. All rights reserved.43
showing the right facets: ceiling fans
© 2009 Endeca Technologies, Inc. All rights reserved.44
query-driven clarification before refinement
Matching Categories include:
Appliances > Small Appliances > Irons & Steamers
Appliances > Small Appliances > Microwaves & Steamers
Bath > Sauna & Spas > Steamers
Kitchen > Bakeware & Cookware > Cookware >Open Stock Pots > Double Boilers & Steamers
Kitchen > Small Appliances > Steamers
© 2009 Endeca Technologies, Inc. All rights reserved.45
results-driven clarification before refinement
Search: storage
© 2009 Endeca Technologies, Inc. All rights reserved.46
crowd-sourcing to tag documents
© 2009 Endeca Technologies, Inc. All rights reserved.47
recall
precision
hcir cheats the precision / recall trade-off
© 2009 Endeca Technologies, Inc. All rights reserved.48
set retrieval 2.0
• set retrieval that responds to queries with– overview of the user's current context– organized set of options for exploration
• contextual summaries of document sets– optimize system’s communication with user
• query refinement options– optimize user’s communication with system
© 2009 Endeca Technologies, Inc. All rights reserved.49
hcir using set retrieval 2.0
emphasize set summaries over ranked lists
establish a dialog between the user and the data
enable exploration and discovery
© 2009 Endeca Technologies, Inc. All rights reserved.50
think outside the (search) box
• relevance-centric search solves many use cases
• but not some of the most valuable ones
• support interaction, exploration
• human-computer information retrieval
© 2009 Endeca Technologies, Inc. All rights reserved.51
one more thing
…
© 2009 Endeca Technologies, Inc. All rights reserved.52
“Google's mission is to organize the
world's information and make it
universally accessible and useful.”
© 2009 Endeca Technologies, Inc. All rights reserved.53
organizer or referee?
© 2009 Endeca Technologies, Inc. All rights reserved.54
thank you
communication 1.0email: [email protected]
communication 2.0blog: http://thenoisychannel.com
twitter: http://twitter.com/dtunkelang