Personalizing Java based Answers for Hundreds of Millions of Users
Anurag GuptaSenior Architect, Yahoo Answers & [email protected]
Agenda
• Industry Gaps• Vision• Strategy• Use Cases• Architecture• Next Steps
2010: Resurgence of Q&A
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2010: A year of highlights…
2011: The story continues…Quora, Location-based Q&A apps (Crowd Beacon, Hipster), Facebook Questions and Mahalo pivoting, Answers.com acquisition…
Launch Acquisition Investment Mobile play
. . . Yahoo! Answers is still #1 (twice size of nearest competitor)
- 4 -
• Meeting unmet needs:– Improving signal to noise ratio
– Beyond realtime: creating User Generated Content of lasting, evergreen value
– Organising people’s knowledge and opinion for mass consumption
– Allowing people to connect and share based on common interests, locations etc.
– Providing platforms for people to become regarded as experts
• Identifying untapped monetisation opportunities– Mining intent and interest and information from participating users
Why this activity?
Companies entering market to address deficiencies of Social Media Companies entering market to address deficiencies of Social Media
- 5 -
Industry Gaps
Personal Relevance User Reputation Content Quality
No understanding or filtering of content by interest
Lack of understanding of quality contributors / content – poor signals
Spam management
No filtering of content by social circle or user reputation
Persona vs. Real identity No distinction between knowledge vs. conversational Q&A
Almost no ability to post location-specific questions and filter content by location
No topic specific reputation (PeopleRank)
No ‘memory’ – hard to surface previous questions around topic
Limited action, reaction, interaction loops – opportunity to improve engagement through notifications/follows
No community tools for users to engage outside of Q&A
- 6 -
Yahoo Answers is the place to share opinions, experience & knowledge around personal interests
- 7 -
Y! Answers: Leading Site with over 2X next competitor
Unique Users - Comscore% Reach - Comscore
Jun-11 M/M Y/Y Jun-11 M/M Y/Y
Reference 745 M -2% 11%
Wikimedia Foundation Sites 399 M -3% 5% Wikimedia Foundation Sites 54% -1% -5%
Yahoo! Answers 245 M -2% 17% Yahoo! Answers 33% 0% 5%
Baidu Answers 109 M 4% 10% Baidu Answers 15% 6% -1%
eHow 82 M -8% 13% eHow 11% -6% 1%
Answers.com Sites 72 M -19% 5% Answers.com Sites 10% -17% -6%
- 8 -
Strengthen core and reach out
Personalization,User Interest GraphUser Reputation
Distribution
Ecosystem
Monetization
Personalization & Relevance
Insight
sUse
rs
Conten
t Ads
APIs
PublisherPartners
Yaho
o
Partner Data
APIs
User clicksUser clicks
Social graphSocial graph
Ranked content, video, adsRanked content, video, ads
Connected Devices
User Generated Content, taggingUser Generated Content, tagging
Personalization & Relevance
FinanceSports
News
3rd party publisher Ads
Content & Ad ServerIn-memory user-content-relevance_score
Users
CollaborativeFiltering, social, geo, time
User Segments
Advertisers Social Graph ‘like’
User InterestGraph Tag
User clicksSearch termsRanked content & ad
Interactions:UGC, tags, Q&A
Publishers
Gaps driveacquisition ofnew relevant long-tail content
Search
Content-Tags Ad & Content
Feeds
Yahoo Answers Personalization Use Cases• Learn about new users’ interests (cold-start)• Show relevant questions to user that comes via search engine• Show relevant questions to Answerer on Y! Answers or 3rd party site• Use knowledge of user interests to increase user engagement, page views, reach, monetization
# Best AnswersAttributedTo Answerer
Useful Vote
PeopleRank ofViewer who voted“useful”
Answerers with High PeopleRank
Viewer’sinterest
Question Popularity
Quality ofAnswers
High qualityHigh relevanceQ&A page
Answers: Relevance & Content Quality
LikeVote
UserInterestGraph
Answerability
Increase signal to noise ratioReward content creators with relevant audienceHelp audience discover relevant high quality content
Green – Y! wideYellow – Answers specific
Architecture for Online & Offline Computation
Front-End
Middle-tier
NoSQLLong Tail
Cache
Oracle
User Profile Services
Tags
User interest
Content
search terms,UGC
Answers serving
New Offline on Hadoop Grid
userId, contentId,
relevance_score
3rd party feeds
FeedAcquisition
Notification
Fast path
PeopleRank
Question Popularity
Answerability
Quality of Answers
Collaborative Filtering
Thumbs-up
TagsRelevancecomputation
New Online serving
Offline Relevance Computation
Answers Data on Grid
UserInterestGraph
1, userID
2, viewerinterests
PeopleRank
3, viewerinterests
4, top answerers
5, top answerers
6, Qs answered
3b, viewer interests
4b, popular Qs
RelevanceComputation
7, userID-Q-relevance_score
Incremental Online Relevance Computation
Front End
Middle Tier
Answers Oracle Database
1, click, search, UGC
2
UPS
3, userID, tags
4, viewerinterests
PeopleRank
5, viewerinterests
6, top answerers
7, top answerers
8, Qs answered
5b, viewer interests
6b, popular Qs
RelevanceComputation
9, relevant Qs
10, relevant Qs
Next Steps
• Move Oracle batch processing to Hadoop grid• Get Answers data on Hadoop grid• Annotation of source property for user interest• Detect useful vs. interesting feedback• User Interest Graph• PeopleRank• Tag computation• Bucketing infrastructure• Notification services