Post on 19-Aug-2020
transcript
11/28/11
1
Beyond Ten Blue Links Seven Challenges
Ricardo Baeza-Yates
VP of Yahoo! Research for EMEA & LatAm
Barcelona, Spain
Thanks to Andrei Broder, Yoelle Maarek & Prabhakar Raghavan
Agenda
• Past and Present • Wisdom of Crowds • Current Challenges
• Query Assistance Contextualization • Universal Search Web of Objects • Post-search Experience • Application Integration Implicit Search
• Future
11/28/11
2
History of Web Search
Generation Technology Wisdom
First: 1994-98
Second: 1997-2003
Third: 2003-2010
Fourth: 2008-??
Classical IR
+Link Analysis +Anchor text
+Click-through voting
+Usage data mining
+Query intent detection +Learning to rank
Writers
+Webmasters +Readers
Everyone
Everyone
4
Today: Internet and the Web
§ Between 1 and 2.5 billion people connected § 5 billion estimated for 2015
§ 1.8 billion mobile phones § At least 500 million had mobile broadband in 2010
§ Internet traffic has increased 20x in the last 5 years § More than 500 million Web servers § The Web is in practice unbounded
§ Dynamic pages are unbounded § Static pages: over 50 billion?
§ Boom of Social Media and UGC
11/28/11
3
Today: Search Rectangle Very little differences between major search engines
A rectangle – text box for your queries
Other forms of rectangles? Embedded in a portal Always here in a toolbar Ultimate rectangle: omnibox
Quantity
Quality
User- generated
Traditional publishing
Today: Web Content
11/28/11
4
Today: Trends
• User Generated Content – Massive (quality vs. quantity) – Social Networks – Real time (people + physical sensors)
• Impact – Fragmentation of ownership – Fragmentation of access (longer tail) – Fragmentation of right to access
• Viability – Business model based in advertising
Search is Evolving
• Already, more than a list of docs • Moving towards identifying a user’s task • Enabling means for task completion • New experiences based on the Web 2.0 • Permanent challenges: on-line, scalability
11/28/11
5
9
The Wisdom of Crowds • James Surowiecki, a New Yorker columnist,
published this book in 2004 – “Under the right circumstances, groups are
remarkably intelligent” • Importance of diversity, independence and
decentralization “large groups of people are smarter than an elite few, no matter
how brilliant—they are better at solving problems, fostering innovation, coming to wise decisions, even predicting the future”.
Aggregating data
Geo-tagged Photos in Flickr
11/28/11
6
Yahoo! Clues
The Wisdom of Crowds
– Popularity – Diversity
– Quality – Coverage
Long tail
11/28/11
7
The Head of the Wisdom
People
Interests
14
Heavy Tail of User Interests
Many queries, each asked very few times, make up a large fraction of all queries Movies watched, blogs read, words used …
Normal people
Weirdos
One explanation
11/28/11
8
15
Many queries, each asked very few times, make up a large fraction of all queries
Applies to word usage, web page access, … We are all partially eclectic
People
Interests
Personal distribution has a heavy tail
Broder, Gabrilovich, Goel, Pang; WSDM 2009
The reality
1. Query Assistance
• Related queries • Spelling correction • Query suggestions • Instant previews
• What the user would like to see?
11/28/11
9
17
Y! Search Assist
2. Contextualization
• Context: • Local: geography, language, … • Person: do we know the user? enough data? • Social • Task
• Personalization: Data volume vs. privacy • Contextualization: Small crowds
• What is the right interface?
11/28/11
10
Zipf: The Principle of Least Effort
Data per user is a power law
11/28/11
11
Usage data at a very large scale over larger and larger populations
over longer and longer periods of time
Personalization Privacy More data via larger
communities, makes data less personalized
wisdom of crowds does not work well on small corpora
Over personalization endangers privacy
Long-term logs endanger
privacy
We are far from being done with innovation in search engines Large scale usage data is key BUT
21
Contextualization Challenges
3. Universal Search
• What sources and media to show? • How many results from each source? • How to rank mixed media? • How to display the results?
• Aggregated Search
11/28/11
12
More Information in One Search
Shortcuts
Deep Links
Enhanced Results
24
4. Web of Objects
• We move from a Web of Pages to a Web of Objects
• Objects are people, places, businesses, restaurants … (named entities)
• Objects have attributes – Missing, noisy, etc.
• Intents are satisfied by presenting objects and attributes
• Attributes define faceted search
11/28/11
13
25
Research Challenges
• Crawling objects • Object extraction (entities) • Object disambiguation • Object consolidation • Object normalization • Object indexing • Object ranking • Object visualization
26
Time Explorer
• Finding Relations among Entities in News – Past, present or future!
• Baeza-Yates, Searching the Future, 2005. – The clue is the interface – Part of the Living Knowledge EU project
• Winner of the HCIR 2010 Challenge • New York Times collection (1987-2007) • Found many interesting examples • Generates new NLP research problems
11/28/11
14
Time Explorer
((c) Timelijne with entity trends
Time Explorer
11/28/11
15
29
5. Post-search Experience
• User feedback (like, +1, …) • Enhanced results • Faceted search • Sharing • Translating
• How to manipulate and enhance results?
30
6. Application Integration
• Integrate third-party applications in the user experience
• Trigger applications based on query intent
• Example: Yahoo! QuickApps
• When and how to trigger? • Which application to trigger? App Market
11/28/11
16
31
7. Implicit Search
• Solve the task searching “for” the user • Recommendations • Enable related things • Search as a back end process that is
triggered depending on the context – Writing email – Browsing news
• How to predict well? How to do it well?
Conclusions • Web search is no longer about document
retrieval • Means for web-mediated goals
– New breed of search experiences – Demands search ecosystem combining content with intent – Exploiting the Wisdom of Crowds behind the Web 2.0 – Contextualization versus personalization
• Optimize common tasks • Move away from privacy issues
11/28/11
17
The New Frontiers
Front-end and user experience The most probable reason for users to switch
between quasi-equivalent engines is a better user experience
Depart from the rectangle/ranked list paradigm Get rid of queries? Implicit search
Content delivery is one flavor But in general, why should we even have to formulate
a query?
33
34
What’s next? Fourth generation
Explicit demand for information driven by a user
query Increase use of
context
Active information
supply driven by user activity and
context
From Information Retrieval to Information Supply
11/28/11
18
Questions? rbaeza@acm.org
http://search.yahoo.com http://labs.yahoo.com
http://sandbox.yahoo.com
Second edition appeared in 2011